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Analytical Approximations to 
Approximations in the 


Chebyshev Sense 


By SIDNEY DARLINGTON 
(Manuscript received February 20, 1969) 


This paper concerns approximation in the Chebyshev, or minimax 
sense such that (1) a minimax approximation implies a maximum number 
of zero error points separated by equal error extrema, and (it) the approx- 
imating function can be so formulated that the disposable parameters 
are all the coefficients in a polynomial, which may however be part of a 
more complicated function the rest of which is prescribed. Weighted minimax 
polynomial approximations can be included, by multiplying the approx- 
imated and approximating functions by the weight factor. Analytic methods 
are described which yield approximately equal error extrema. They are 
sufficiently simple so that they may sometimes compete with currently 
used iterative numerical methods, especially when the degree of the dis- 
posable polynomial ts large. Their most probable utility concerns explora- 
tions of available accuracies over wide ranges of design parameters such 
as degree of disposable polynomial, interval of approximation, and coef- 
ficients in prescribed parts of the approximating function. 


I. INTRODUCTION 


This paper concerns approximation in the Chebyshev sense, over a 
prescribed interval x, S x S x, of a continuous real variable x. As 
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defined, an approximation in the Chebyshev sense is a minimax approx- 
imation—one in which the maximum error is as small as is possible 
within given constraints on the approximating function. Minimax 
approximations in which errors are weighted by a prescribed function 
of the independent variable can also be treated as Chebyshev approx- 
imations, by multiplying the approximated and approximating functions 
by the weight function. 

Frequently, but not always, approximation in the Chebyshev sense 
implies an error of the “equal ripple” sort illustrated in Fig. 1—that is, 
a sequence of equal positive and negative extrema with monotonic 
variations in between. General necessary and sufficient conditions for 
this are not known. However, the following conditions are sufficient: 
the p disposable parameters of the approximating function are to be 
such that the approximation error can be made zero at p arbitrary 
points within the approximating interval. Referring to Fig. 1, the 
arbitrary error points divide the approximation interval into p + 1 
segments. There is to be a particular division such that the error function 
achieves its maximum magnitude p + 1 times—at the two edges of 
the approximation interval and once within each of the » — 1 interior 
segments. There are to be no other local extrema within the approx- 
imation interval. Generally, shrinking any one of the p + 1 segments 
(by bringing two zero error points closer together or one closer to an 
edge of the approximation interval) tends to reduce the corresponding 
error extremum. Conditions are to be such that all the p + 1 equal 
extrema can be reduced simultaneously only by shrinking all the 
p + 1 segments, which is impossible without shrinking the given approx- 
imation interval. These conditions are encountered in many practical 
problems and are assumed here. Thus we are concerned only with equal 
ripple approximations like Fig. 1. 

Exactly equal ripple approximations have long been known for a 
very few special cases (which have been useful for example in filter 
design). Iterative numerical methods have been developed for the 
solution of various more general problems and are described in text- 


ERROR 


C- 


va Lb 


Fig. 1 —- An equal ripple error function (p zeros; p + 1 segments; p + 1 extrema). 
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books such as Ref. 1. In contrast, this paper describes analytical pro- 
cedures which yield error extrema of approximately equal amplitude. 
Their full range of validity has not been determined. However, they 
are clearly appropriate for a substantial, although poorly defined class 
of problems. It is characterized further later. 

Useful applications are likely to concern equal ripple problems which 
have not been solved exactly by analysis and which involve so many 
disposable parameters that iterative numerical solutions are likely to 
be more costly. The most useful applications probably concern prelim- 
inary explorations over primary design parameters (such as intervals 
of approximation, magnitudes of errors, and degrees of approximating 
functions) before numerical refinement of specific designs. Accordingly, 
this paper emphasizes relatively simple means for approximating equal 
ripples and says little about more complicated higher order approx- 
imations. 

The procedures apply only to approximating functions characterized 
as follows. The disposable parameters must be all the coefficients in a 
polynomial (which may have been obtained, however, by some sort of 
transformation on the original independent variable and/or the approx- 
imating function). This is referred to as the disposable polynomial. 
On the other hand, the disposable polynomial may be only a part of a 
more general approximating function the rest of which is prescribed 
in advance (for example, the numerator of a rational fraction with a 
prescribed denominator). Weighted as well as unweighted minimax 
approximations are included. For some problems, closed form formulas 
are obtained for approximate error size as functions of the degree of 
the disposable polynomial, usable for degrees of any size. For other 
problems, the error size is related to an eigenvalue of a certain matrix 
equation, but the order of the matrix may be small even though the 
degree of the disposable polynomial is arbitrarily large. 

A primary concern here is the distinction between simple truncation 
of infinite series of Chebyshev polynomials and approximation in the 
Chebyshev or minimax sense. The functions which we are to approx- 
imate can be expanded into infinite series of Chebyshev polynomials. 
Approximations with polynomials of degree n can be obtained by 
simply truncating the infinite series after the terms of degree n. How- 
ever, simple truncation does not usually give an approximation of the 
minimax sort. A polynomial of degree n which approximates the given 
function in the minimax manner can be represented as a linear combi- 
nation of Chebyshev polynomials, but the coefficients are usually 
different from those in the truncated infinite series. 

One way to approach approximation in the Chebyshev sense is to 
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start with the truncated series of Chebyshev polynomials. Then cor- 
rections to the coefficients are determined, to obtain equal ripple error 
functions. Such a procedure has been used before, for example in Refs. 
2 through 4, and is used here. Departures from the previous work 
known to the author include simple approximations to ideal solutions 
formulated for more general approximating functions and for weighted 
as well as unweighted minimax approximations, as opposed to more 
rigorous analyses of more restricted problems. 

Sometimes truncation of an infinite series of Chebyshev polynomials 
yields an approximately equal ripple error function without further 
adjustment of the coefficients. The procedures for adjusting the coef- 
ficients, described herein, sometimes also give an initial insight into 
whether or not adjustments are needed. 

It is interesting to note that some 35 years ago a conference was 
held in the office of T. C. Fry, at Bell Telephone Laboratories, to 
consider some filter patents offered for sale by W. Cauer. One of the 
patents disclosed Cauer’s equal ripple image impedance and transfer 
functions, which soon became famous among circuit theorists, but did 
not include proofs or derivations. At the conference, S. A. Schelkunoff 
asserted a very simple principle which enabled him to confirm and 
interpret Cauer’s formulas. However, it did not explain how Cauer 
might have derived or discovered the formulas. The principle applies 
also to more general equal ripple approximations. It does not, by 
itself, solve the approximation problem, but it does furnish a starting 
point from which to develop procedures which do. We call it Schelkunoff’s 
principle. 

Section II describes Schelkunoff’s principle. Section III solves two 
problems for which exactly equal ripple solutions are easily found. 
Section IV develops general procedures, whereby approximate so- 
lutions can be obtained for a large class of problems. Section V further 
clarifies the general procedures by means of examples. 

Various aspects of the procedures described here bear some relation 
to other work. Section VI notes some of these relationships. Finally, 
Section VII reviews and summarizes the general conclusions, including 
a comment on the possibility of generalizations to disposable rational 
fractions. 


Il. SCHELKUNOFF’S PRINCIPLE 


Consider first a function T,(x) proportional to a Chebyshev poly- 
nomial, defined by 
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Tit) = Bn cos (n cos’ 2). (1) 


It is illustrated in Fig. 2a, for n = 4. It may be regarded as an “equal 
ripple” approximation to zero, over the interval —1 < x < +1, by 
a polynomial of degree n in which the coefficient of 2” is required to 
be K,, . Let 


x = COS ¢. (2) 


Substitution in equation (1) gives 


K,, 
T(t) = T.@) = 5n-7 Cos ny. (3) 


The new function is illustrated in Fig. 2b, again for n = 4. Note that x 
is periodic in » with period 27 and T,,(y) is periodic in ¢9 with period 
2x/n. Thus there are n periods of T,,(y) in each period of x. 

Stated with a little more detail, we have this situation: The original 
function T,(x) has “equal ripples” in the sense of equal extrema. How- 
ever, the extrema are not uniformly spaced and hence the ripples 
differ as to width. The periodic transformation from zx to ¢ has two 
important properties. As ¢ Increases, x sweeps back and forth across 
the approximation interval, —1 S$ x S +1. In each interval in which 
x varies monotonically from -+-1 to F1 the ¢ scale is a distortion of the 
x scale such that the ripples of 7,(y) are uniformly spaced and are 





xvc=-t 
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Fig. 2 — Illustrating Schelkunoff’s principle. 
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also otherwise identical. The ripples could not be made identical by 
any distortion of the x scale alone if T,(x) had unequal maxima or 
unequal minima. This is a special case of Schelkunoff’s principle. 

More generally, let E(x) be a function of x with the following prop- 
erties over an interval x, S « S x, : The function E(z) is real and single 
valued; there are a number of local maxima, all equal; there are a 
number of local minima, all equal; the equal extrema for the interval 
include the end points E(z,), E(z,) (at which dE/dx need not =0).* 
Then Schelkunoff’s principle asserts the existence of a transformation 


x = I(g) (4) 


with the following properties: The original variable x is periodic in the 
new variable g; as ¢ increases x sweeps back and forth over the given 
interval x, S x S x,, monotonically each way once each period; the 
periodic function 


E(y) = ElT(¢)] (5) 


has a number of periods in each period of x, equal to one less than the 
number of extrema of E(x) in the given interval of x (including the end 
points). In applications to approximation in the Chebyshev sense, 
E(x) and L(y) represent the equal ripple error, as functions of x and ¢. 

The transformation x = I(¢) clearly is not unique, for there are 
obvious transformations on ¢ itself which retain the desired character 
of L(y). For example, ¢ can be replaced by ¢ + q(v), where q(¢) is 
periodic with the same period as /(y) and is such that y + q(¢) is 
monotonic in v. When E(x) is continous (in the given interval of x), 
a particular » + q(¢) will make Hy) sinusoidal. 

We do not attempt a very general, rigorous proof of Schelkunoff’s 
principle; we merely use it as a guide to a strategy for solving minimax 
problems. However, a demonstration of the principle for a specific 
class of problems will be implicit in what follows, for we shall find 
transformations which do in fact change our equal ripple errors into 
sinusoidal errors. 

In later sections we will again use the transformatlon (2), or a gen- 
eralization for end points other than « = +1. Usually, however, it 
will not be a Schelkunoff transformation. We will use it to transform 
the disposable polynomial in x into a finite Fourier series in ¢. The coef- 


* Problems can be found such that minimax approximations have equal ex- 
trema which do not include both end points. Then the number of local extrema 
for the approximation interval is abnormally large when the end points are 
counted. Such problems are not considered here. 
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ficients of the Fourier series are to be chosen in such a way that the 
overall approximation is approximately sinusoidal on a distortion of 
the ¢ scale. Similar strategies have been used before, for example in 
Refs. 2 through 4. 

Means for determining the distortion of the ¢ scale and the adjustment 
of the Fourier coefficients are introduced by means of two examples in 
the next section. 


III. TWO GENERALIZATIONS OF CHEBYSHEV POLYNOMIALS 

The two problems described below are solved exactly. The form of 
the solutions suggests approximate solutions to more general problems. 
3.1 A Rational Function Generalization of Chebyshev Polynomials 


Consider the following generalization of Chebyshev polynomials: Let 





P(x) 

Ty,(t) = D(x) (6) 
in which P(x) is a polynomial of degree n and D(x) is a polynomial of 
degree < n. Suppose D(z) is prescribed in advance and P(z) is to be 
chosen in such a way that Tp,(x) has equal ripples like those of a 
Chebyshev polynomial in the interval —1 < x < +1. More specif- 
ically, require that Tp,(z) = -k1 at n — 1 local extrema within the 
interval —1 S x S +1 and at the end points z = +1. Real zeros of 
D(z) are to be excluded from the interval —1 < x < +1. The con- 
ditions on the extrema insure that all n of the zeros of P(x) will be in 
the interval. 

Let y be defined again by equation (2) and note that the real axis 
in the ¢ plane corresponds to the real interval —1 < x S +1 in the 
x plane. If cos ¢ is written in terms of exponentials, the polynomial 
P(x) can be related to » by 


P(x) = Pe’’) + Pe '’) (7) 


in which P(-) is a polynomial of the same degree, n, as P(-). Given the 
coefficients of P(-) it is a simple matter to compute the coefficients of 
P(-). We shall consider our problem solved when we have found the 
coefficients of P(-) required for our equal ripple conditions. 

It is convenient to relate the prescribed denominator D(x) to ¢ in 
the following slightly different way: 


D(x) = Dee’*) De®) 
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in which D(-) is a polynomial of the same degree, S n, as D(-). If 
D(-) and D(-) are written in factored form there is a one to one corre- 
spondence between factors. Thus if (xz, — x) is a factor of D(z) and 


(1 — y,e'*) a corresponding factor of D(e*’), 
te— 2x = M,(1 — ye*)(1 — ye”) | (9) 
in which M, is a constant scale factor. 
By equation (2), e'* = +1 at x = +1, and hence 
tk$l=M,(1+7.)’. (10) 


Given x, , two solutions for y, are easily obtained, for which |, | is 
respectively < 1 and > 1. (Exclusion of zeros x, of D(x) from the real 
interval —1 < x S +1 removes the possibility of | y.| = 1.) We need 
the solution for which | y, | < 1, for reasons which will soon be apparent. 
From equation (11), with the sign of the square root chosen for | y, | < 1, 


eke ee z= 11! Ea 
jou. [mat f RO aeT e w aD 
The scale factor /, need not concern us at this time. 


The function Tp,(z) can now be mapped into a function T'p,(¢) in 
terms of equations (7) and (8). 





Tos(t) = Tale) = Soe (12) 


By Schelkunoff’s principle, our requirements on the extrema of Tp,(x) 
imply that 7'p,(y) has the following special form 


P@'*) + PC) 
Dee'*) De**) ’ 
a oa oe + ag ees Nee ; (13) 
The variable » + (1/n)f(¢) is a distortion of the ¢ scale for Schelkunoff’s 
principle, for which f(g) is to be periodic in ¢ with period 27 and » + 
(1/n)f(¢) is to vary monotonically with ¢. 
Given f(y) one can easily find P(e**) by equation (13). The problem 


is to find an f(¢) for which P(-) is a polynomial of degree n. From 
equation (13) 


1 ilnet+f(e)) ig -i¢g 
Pon + Pe) = | ee ee) | ay 
ae Eeonieee ten D,(e**) D,(e**) 


If P(e**) is to have no terms in e*’® with o > n, e*’‘* needs to cancel 


T only) a 
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out D,(e**). This suggests 


eit 6?) = oe (15) 


(and note that this does make f(y) real when ¢ is real). Substitution 
in equation (14) gives 


P@'*) + P@**) = ge"? D7E**) + 4°" DE"). (16) 


Expanding the right side gives a polynomial in e*®. When polynomial 
D(-) is of degree < n (as assumed) there will be no powers of e*” out- 
side the range —n to +n. Collecting positive powers (and half the 
constant term) gives P(e**). 

The function f(¢) determined by equation (15) is periodic in ¢ with 
period 27 provided the zeros of D(A) lie outside the unit circle in the 
\ plane, which is assumed by equation (11). It is easily shown that the 
same condition makes ¢ + (1/n)f(¢) monotone in ¢. Once P(-) is known 
it is a simple matter to find P(x) by means of equations (2) and (7). 
It is probably simplest to omit the scale factor MW, of equation (10) in 
the initial formulation. This does not affect the ratio in equation (15), 
but only the scale factor of the polynomial P(x), which can be corrected 
later on [for example to meet the condition Tp,(1) = 1]. 

Obvious generalizations of the problem include the following: For 
extrema A + J (instead 0 + 1) use 


F(z) = A + JT>, (2). (17) 
In a more general interval of x, say x, S x S x , replace equation (2) by 


Gee Bas ey ha 
oar oa + 7 608 @ (18) 


x= 


and change equation (10) to 
Le UM = M,(1 — Ye) Le — % = M,(1 id Ve) (19) 
The function F(x) defined by equation (17) has long been used by 
filter theorists, but previous derivations have been quite different.° 


The form of equation (16) suggests a similar solution to the problem 
described below. 


3.2 An Irrational Generalization of Chebyshev Polynomials 


Now let 


Teal) = SCF (20) 
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in which P(z) is again a polynomial of degree n but S(z) is a poly- 
nomial of degree S 2n. Suppose S(x) is prescribed and that T;,(x) is 
to meet the same conditions as to extrema as Tp,(z) in the previous 
subsection. In place of equation (8), we can now use 


S(z) = S@’*)SE**) 
to determine a polynomial S(-). We can then replace equation (15) by 


goo « [Se a 


and then equation (16) by 
Pe’*) + P@**) = 4e'"*S(e**) + 4e7'** S(e*’). (22) 


This makes P(-) again a polynomial of degree n. Note that Ts,(«) 
cannot be used in place of Tp, in equation (17), with A # 0, without 
changing the polynomial character of the numerator. 


IV. GENERAL FORMULATIONS 


This section shows how a large class of minimax approximations can 
be approximated by generalizing the manipulations described above. 
In Section V, we clarify the general procedures further by providing 
examples. 


4.1 Unweighted Minimax Approximations 
Let 


Piz) = F@) +e), 4% StS (23) 


in which P(z) is a disposable polynomial of degree n, F(x) is a given 
function to be approximated by P(x) in the interval x, S x S x, and 
e(x) is the error in the approximation. For what P(x) is e(x) smallest in 
the minimax sense? We assume that the minimax e(z) has the equal 
ripple form (Fig. 1) and we seek only approximations to equal ripples. 
We also restrict the class of applicable functions by certain further 
assumptions which can best be introduced a little later. 

As before, let x and ¢ be related by equation (18), so thatz, 2S 2, 
maps into real y, and replace P(x) by 


P(x) = P(e’*’) + Pee’). (24) 


If P(-) is again a polynomial of degree n, it is uniquely determined by 
P(-) [and x, , % in equation (18)]. Now, however, we find it expedient 
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to permit P(z) to include negative powers of z, up to z”. Thus, in 
equation (24), 


Pe). = Pe (25) 
This P(-) is not uniquely determined by P(-). However, P(-) zs uniquely 
determined by P(-), and it is still a polynomial of degree n. We solve 
the approximation problem by finding a suitable P(-), from which 
P(-) can be easily determined. 

We require that the mapping from z to » maps F(z) into a function 
of » with a convergent Fourier series. This amounts to requiring that 
F(x) can be expanded into a convergent series of Chebyshev poly- 
nomials (defined to fit the given interval of approximation). Because 
equation (18) is even in g, the Fourier series has only cosine terms. 
Then, replacing cosines by sums of exponentials, 


F(z) = F@'*) + F€**) 
. (26) 
Re") = 702" 
o=0 
in which the series expansion of F(e**) converges when ¢ is real. 
The desired equal ripple error can be written 


e(x) = ecos[(n + l)o + fy)] (27) 


in which f(¢) is again periodic in ¢ and represents the distortion of the 
gy scale per Schelkunoff’s principle. In an equivalent exponential form 


e(x) = He’) + E@™**) 
E¢é*) = Base (28) 


The exponent 7(n + 1)y, instead of ing as in the previous section, 
reflects the following circumstances: If the Chebyshev polynomial 
series corresponding to equation (26) is truncated after the polynomial 
of degree n, the first omitted polynomial is of degree n + 1. If all the 
other omitted polynomials have sufficiently small coefficients, the 
truncation error will approximate E(x) of equation (23) with f(¢) = 0. 
Note also that a disposable polynomial of degree n has n + 1 disposable 
coefficients. These are an example of the p disposable parameters in 
the more general description of equal ripple errors in Section I. 

Using equations (24), (26) and (28) in equation (23) gives 


Pe'*) + P@'*) = Fe*) + FE") + H@*) + EC). (29) 
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We arbitrarily equate the terms in exp (+7¢) separately, so that 
P(e’*) = Fe'*) + Ee’). (30) 


If equation (80) is satisfied at all real ¢ so is the corresponding equation 
in exp (—7y). Thus a solution of equation (80) is a solution of equation 
(29). But the converse is not necessarily true. Frequently, an exactly 
equal ripple approximation corresponds to a solution of equation (29) 
which is not a solution of equation (80). However, we will find that 
approximations with approximately equal ripples can frequently be 
derived from equation (30), and in a much simpler way. 

In equation (30), expand P(-), F(-), E(-) per equations (24), (26) 
and (28). The result can be rearranged as follows: 


2n+1 


Co 
~id € jifte) he, 
Ge = 5° + Dee 
A=1 A=0 


Gy = Pasi-n — Casi-a ; ASn+1dA8Sn+1; 
= Pass-r , nt+t1l<rAS2n+1. (31) 


In this equation, C,+:4, is fixed by equation (26) but P,4:-, is a dis- 
posable parameter in equation (25). Thus we seek an ¢ and exp [7f(¢)] 
with the following properties: First, (¢«/2) exp [7f(¢)] is to be expandable 
in terms of positive and negative powers of exp (zg). Second, the coef- 
ficients of positive powers are to cancel the corresponding coefficients 
C414, In equation (31). Third, the coefficients of negative powers are 
to be such that, with an appropriate e, |exp [if (y)]| = 1 when ¢ is real, 
so that f(y) is real and the error extrema are equal per equation (27). 
Sometimes it turns out that there are no negative powers beyond 
—(2n + 1). Then the left side of equation (31) can be adjusted to match 
the right side. In many other problems, approximately equal error 
extrema can be obtained by simply ignoring terms in negative powers 
beyond —(2n + 1). 

Now consider the class of functions F(-) such that, in equation (31) 


Bee'*) 


ee (32) 


Dy Crease” 
A=0 
in which A(-) is a polynomial of degree m and B(-) is a polynomial 
of degree yu. If the series converges, as assumed, the zeros of A(z) will 
lie outside the unit circle. 
Under conditions which we shall examine further, the appropriate 
exp [if(¢)] is now as follows: 
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fe) Ae "")X€*) 
Ace’*)X(**) 
in which 6 is small (at real ¢) and X(-) is a polynomial determined by 


two further conditions. First, the zeros of X(z) must lie outside the unit 
circle. Second ¢ and X(-) must be such that 


e A@**)XE'*) | BE’) _ &**NE™**) 
2 Ae’*)X(e**) — Ae'*) X(e**) 


+ 8(e**) (33) 


é 


(34) 


in which N(-) is a polynomial. Let us examine the implications first 
and the existence of such an e and X(-) thereafter. 

When ¢ is real exp (iy) and exp (—7¢) are conjugates, and so are 
identical polynomials in these two variables. This makes f(y) real in 
equation (33), except for small corrections due to 6. When the zeros 
of A(z) and X(z) lie outside the unit circle, as required, the unit circle in 
the z plane maps into contours in the polynomial planes which do not 
enclose 0. This makes f(¢) periodic in ¢. 

The condition on the zeros of X(z) also permits the right side of 
equation (34) to be expanded: 


e*FN(e-**) o 
Se Ge". 35 
Xe") dy Ge 89) 
Using equations (82), (83), (84) and (35) in equation (81) now gives 
(--) 2n+1 
6°") + Ge? = Ge; (36) 
o=1 g=1 
P; = G, +e Cagis ? o = n+ 1; 


G,, ntl<o<2n+1; 


d(e'*) = — >° Gye i”. 
2n+2 
This 6 is small provided the zeros z; of X(z) are such that 27°"*® is 
small. When 6 is small, the actual error extrema will differ from ¢, but 
by no more than = [6] «. 
Equation (34) requires 


5 AG) XE") + Bei’) X(e'*) = Ae’*)ei*NE'*). (87) 


The appropriate degree 7» of polynomial X(-) turns out to be one less 
than the number of poles of zB(z)/A(z) (including any poles at z = ~). 
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When the degree of A(-) is greater than the degree of B(-) and the 
zeros of A(-) are distinct, a set of 7 + 1 homogeneous equations in the 
coefficients g; of X(-) can be derived by evaluating equation (87) at 
the zeros of A(-). Then 


(1, + < Ma)Q = 0 (38) 


in which Q is the column matrix of the coefficients q; of X and M, 
and My, are square matrices of order 7 + 1. Under other conditions 
an equation of the same form can be obtained in other ways. 

Equation (38) requires «/2 to be one of n + 1 eigenvalues for which 
the matrix coefficient of Q is singular. Each eigenvalue determines a 
polynomial X(-) [including an arbitrary scale factor which cancels 
out in equation (33)]. For our purposes, we must choose an ¢ which 
is real and such that the zeros of X(z) lie outside the unit circle. This 
raises a question of the existence of a suitable e and X(-). 

When degree 7 = 0, X(-) is a constant, equation (88) is a real linear 
equation in e, and there is no zero of X(-). When 7 = 1, X(-) is linear, 
e is a root of a quadratic equation. It is not hard to show that the two 
roots are real [under our assumptions regarding zeros of A(-)] and 
that one (the larger) yields a zero 2, of X(z) such that |z,| > 1. Equality 
occurs only in singular cases such that n + 2 zero error points are 
possible [even though there are only » + 1 disposable coefficients of 
P(x)] and can be so placed that there are n + 3 equal error extrema, 
instead of only n + 2. This may be seen by assuming that the zero of 
a linear X(z) is +1, and then noting that, in equation (33), 


XC)... det 
Xe) ieee 


Conditions for the existence of a suitable « have not been established 
for y > 2. They are probably at least closely related to the (unknown) 
general conditions under which the minimax approximation has the 
equal ripple form of Fig. 1. 

The procedures described here are appropriate only when a suitable 
e does in fact exist. However, degrees 7 = 0 and 1, for which existence 
has been established, are sufficient for many practical problems. For 
approximately equal error extrema, equation (82) itself need be only 
an approximation, and polynomials A(-) and B(-) for which 7 = 0 
or 1 are likely to give a good enough approximation. This is particularly 
true when degree n of P(x) is sufficiently large so that coefficients 





eke”, (39) 
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C, ,o > n, in equation (26) approach a simple asymptotic behavior. 
Percentage variations between error extrema need not have to be 
very small even though the absolute errors must be very small. For 
example, a 10 percent variation between very small extrema may be 
acceptable, compared with large variations obtained by truncation 
of the infinite Chebyshev polynomial series. 

Table I indicates the degree m of A(-) and u of B(-) for which » = 0 
or 1. The column headed m + yu + 1 indicates the number of dispos- 
able parameters in the rational fraction A/B which can be adjusted 
to approximate the sum in equation (82). 

The procedures for 7 = 0 and 1 are particularly well suited for rapid 
explorations of available error magnitudes as functions of initial design 
parameters, such as degree of the disposable polynomial, extent of the 
approximation interval, and parameters in the approximated function. 
When only the error magnitude is needed, it is not necessary to calculate 
the coefficients of polynomial P(-), which requires the series expansion 
(35). When 7 = 0, the error magnitude is (approximately) the single «¢ 
determined by equation (38). Then simple closed form formulas can 
frequently be obtained (and will be included in 4 of the 5 examples 
in Section V). When 7 = 1, ¢ is one of the two roots of the quadratic 
equation required by equation (38). (To meet the condition on the 
zero of X(-), the larger e must be chosen.) 

When ¢ has been determined, it can be compared with the error er 
obtained by simply truncating the Chebyshev polynomial expansion 
of F(x). In terms of equations (26) and (32) 


~ Be'*) eimtDe Be") grind) e (40) 
A(e'*) A(e-*?) 


er 


Comparing the maximum er (at real ¢) with ¢ indicates the improve- 


TaBLE I—Value of m and pu for which 7 is 0 or 1. 


Degree m Degree u Degree"y 
of A(-) of B(-) of X(-) mtp+l 


NrONFS 
RPReRFOOCO 
Bee OO 
RPWNWNE 
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ment to be obtained by the minimax refinement of the truncated 
series. Frequently, max e7 occurs at ¢ = 0 or z, and then 





_ 5 BE 1) 
max er = 2 A(a1) (41) 
4.2 Weighted Minimax Approximations 
Let 
1 
P(x) = F(z) + Wa e(z) (42) 


in which P(x) is again a disposable polynomial of degree n, F(x) is 
again a given function to be approximated in the interval x, S$ « S % , 
and the new function W(x) is a given weight factor. For what P(x) 
is e(x) smallest in the minimax sense? We will again assume that the 
minimax €(x) has the equal ripple form and will seek only approxi- 
mations to equal ripples. We will also assume that W(x) is bounded 
and positive definite in the approximation interval. (A point where 
W(x) = 0 or © would probably spoil the equal ripple character of the 
minimax approximation.) 

Map from x to ¢ as before and define P(-), F(-) and £(-) again by 
equations (24), (26), and (28). Express W(x) also in terms of expo- 
nentials, but as a product instead of a sum. More specifically, let 


H(z) = —log W(x) = H¢'*) + HE**) (43) 


and assume that W(z) is sufficiently smooth, as well as bounded and 
positive definite, so that H(z) is regular at |z| < 1. Then 


i — ip —te 
W(2) _ De )De ys 
Diet) = 

with log D(z) regular when |z| < 1. This D(-) is a generalization of the 
D(-) of Subsection 3.1 and of [S(-)]? of Subsection 3.2. Frequently 
it can be found by direct factorization of a function of e’® as in Section 
III. 

Equations like (29) and (30) can now be obtained as before. The 


only difference is that H(-) must now be multiplied by the product 
of functions of ¢ in equation (44). Then equation (30) becomes 


Pe’) = F@*) + DE*)DE"*)EE*) (45) 


and equation (81) becomes 


(44) 
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2n+1 co 
dX Ge? = 5 Dee") De **)e4 (ep Qu Cosine’; (46) 
=] =0 
Gy = Pari — Cassar 5 Stor 1; 
Se is nmt1l<AS2n+1. 
Retain the rational fraction B(-)/A(-) of equation (82), but change 
equation (83) to 
gitar x D(e"**)Ae™**)X**) 4 é(e**) (47) 
De®) Ae?) XE**) 
so that 
_ DAC) XE) 
A(e'*)X(e"**) 
in which 6 and 6 are small. Then change equation (34) to 


27-19 -i¢g ig io —i(¢g) -ig 
e De FAC on8G ae AG ) _é ite ys (49) 
2 A(e’*)X(e**) A(e’*) Xler"?) 
Using equations (32), (47) and (35) in equation (46) now gives equation 
(36) again. From equation (49), equation (387) must be changed to 


De'*)De** ec” + 6(e**) (48) 


5 De) AC XE) + Be*)X(e%*) = Ale’*)e**NE**). (50) 


Equation (50) can be used to find e, and the X(-) and N(-) needed 
for equations (85) and (86). 


4.3 More General Approximating Functions 
Let 


¥[P(z), z] = Gi) + e(z) (51) 


in which ¥[P(2), x] is a given function of x and a disposable polynomial 
P(x), G(x) is a given function to be approximated by W[P(x), x] in 
the interval x, S x S 2%, , and e(2) is the error in the approximation. 
For what P(x) is e(x) smallest in the minimax sense? Under certain 
further assumptions regarding W(- , -) this approximation can be 
transformed into a weighted minimax polynomial approximation. 

Assume an inverse VW" of Y[P(x), x], with respect to P(z), exists 
over the approximation interval. Then equation (51) can be replaced 
by 


P(x) = ©" {[G(a) + e(@)], z}. (52) 
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Assume ¥7'(- , -) is sufficiently smooth and e(z) sufficiently small to 
justify the following approximation (in the interval x, S x S 4%): 
oe av"[G@), 2] 


This is in the general form (42) with 
F(x) = ¥'[G@), 2] 


Dees av "'[G(a), a] 
Wx) dG(z) 


Thus Subsection 4.2 can now be applied provided the F(-) and W(-) 
determined by equation (54) mect the appropriate conditions. Recall 
that we required W(x) to be bounded and positive definite over the 
interval of approximation. However, reversing the sign of W(x) merely 
reverses the sign of e(x). Hence, in equation (54), we need only require 
that the partial derivative must be bounded and either positive definite 
or negative definite and sufficiently smooth for log D(z) to be regular 
when |z| < 1. 
As a first example of the inversion of equation (51), let 


W(x)P(xz) = G(e) + e(z) (55) 
where W(x) and G(z) are given functions of x. Then 


G(z) + ex) _ Gz) a 1 
W(z) ~~ W(x) ° W(x) 


(54) 





P(r) = 








2(z). (56) 


As a second example, let 
[A(x) + B(x)P(x)}*? = G(x) + e(e) (57) 
where A(x), B(x), and G(x) are given functions of x, with B(x) and 


G(x) positive definite over the approximation interval. Solving for 
P(x) gives 








Pe) = AO 2 w+ yqe@- —«68) 
If the term in e”(x) is omitted 
(AG) + B@PG)}! = Ge) + eta) — 3 (59) 


in which terms in e’(x) have been neglected for « > 2. An equal ripple 
2(x) in equation (57) yields an approximately equal ripple error if the 
last term is somewhat smaller than the extrema of e(z). 
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4.4 Relation to Phase Modulation 


The function of ¢ defined by equation (28) is similar to functions 
of time ¢ used in communication theory to describe phase modulated 
signals. If » is replaced by ¢ in equation (28), 


E¢e'') of Sealed (60) 
This is the exponential representation of a phase modulated signal 
in which the carrier (radian) frequency is n + 1 and the baseband 
signal f(t) is periodic with one period every n + 1 periods of the carrier. 
The signal may be regarded as the carrier plus sequences of upper and 
lower sidebands. The upper sidebands are determined by the coeffi- 
cients C414, in equation (81). The lower sidebands are determined 
by the requirement of a purely phase modulated signal. Finally, if 
the sequence of lower sidebands extends as far as the negative carrier 
frequency —(n + 1) we truncate it at —n. 
Weighted minimax approximations can be interpreted similarly, 
in terms of simultaneous phase and amplitude modulation. 


4.5 Alternative Procedures 


It is obvious that the procedures described above can be varied in 
many different ways. A very few of the possible variations are noted 
below. 

Preliminary manipulations may be needed to obtain a formulation 
in which the disposable part is a polynomial. Also, the pertinent Fourier 
series may be sums of sines instead of cosines. Both these situations 
will be illustrated by Example 5, in Section V. 

If ov "[G(z), x]/8G(z) is expressed as a product of functions of z, 
D(e'*) can be formulated as a product of corresponding factors. A 
factor of the form (1 — 2/z,)’ contributes a factor of the form 
[M.(1 — vye**)]’, as in Section III. More generally, there may be 
advantages to replacing the D(-) of equation (44) by D(-) defined 


W(x) = De’) DE’). (61) 


Then the D?(-) in equations (48), (49) and (50) is replaced by D**(-). 
It may sometimes be convenient to express P(x) and F(x) as products 
of factors in exp (-k7¢) instead of sums, say 


P(x) = Pe'*)PE**), F@) = Fe'*)FE"*) (62) 


in which P(-) is a polynomial of degree n with no negative powered 
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terms. If a term in e’ is neglected, one can now replace equation (29) 
by 








ig -i¢g ig Be. || ~ig Be | 
Pe’*)P = lF NE :; : 63 
Cpe) = | Fe") + TEs | Fe") + Fie (63) 
Equating factors separately replaces equation (30) by 
i i E (e'*) 
P(e’*’) = Fe?’ ——s 64 
NRC Ter i (64) 


Subsequent modifications of our previous procedures are now easily 
worked out. 

It would be possible to replace equation (33) by other functional 
forms for exp [zf(v)]. The moduli must approximate unity at real 9 
and expansions in positive and negative powers of exp (ig) must exist. 
Disposable parameters are to be adjusted so as to approximate the 
required coefficients of positive powers. However, except for very 
special functional forms [such as equation (88)], the adjustment is 
likely to be a quite complicated task. 


V. EXAMPLES 


This section further clarifies the general procedures by means of 
five examples. 


5.1 Example 1 
Let 
P(z) 
UeSeg/ Zo)” 


in which 2, is a given constant, |z,| > 1, and the degree n of the dis- 
posable polynomial P(z) is large. What is the approximate amplitude 
|e| of the equal error extrema of the minimax approximation? 

This is a special case of equations (55) and (56), which can be solved 
as a special case of equation (42) for which 


=1+e@), —-lS2r¢srl (65) 


Fe) = yay = — 2/0 (66) 
To apply Section IV, define F(-) and D(-) by 
(1 — 2/x)* = Fe'*) + Fe**) = DE*)DE), 


a (67) 
C ise 
> ee, 


Fe’*) 
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Log D(z) regular, |z| S 1. 


We have already factored a linear function of z in terms of exp (+79), 
in Section III. A similar factorization now gives 


! a ete 
Der) = GE yl <t (68) 


and then the coefficients of C, correspond to an expansion of 


E — ye'*)(1 — mel) 


cee = >) Ce + 2, Cee (69) 


c=0 
To determine ¢ we only need the coefficients C, for ¢ > n, which we have 
assumed to be large. 

The following expansion of [1 — y exp (ig)], valid for |y| S 1, is well 
known 


i) 


(eaqel)t a ye Ke" 








c=0 
Ky = —1, Kk, = =yf2} (70) 
—_ — ! 
ie (20 — 3)! v’, o>2 
4°"*(¢ — 2)! a! 
When n is large and \ <n, 
Kaan  2m+AHM-1, 2 
Ka 264 2 ee 
in which 
ein oe ee ee eee 
sae ay ee mal (72) 


As a result, when n is large 


K see 9 
n 


L— yee YK ei . 73 
(1 yet)! Dy Kael? + 88 na (73) 
Now note that 
Le cma Kus | = -i (UBS key?) K ns 
era lt We ee oe OF ee re reer epee ea ee 
[ 1+y7° 1 — kye’® » ‘ (1 + y’)*(1 — kye’*) 
(74) 


If equation (74) is used to evaluate C, in equation (67), only the last 
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term contributes to C, when o > n. Then 


(ve) —_ 2\4 
>> Cee = e 2 ke ) Kner (75) 
X=0 Giska p(k skye) 
which is a special case of equation (82) with A(-) a linear polynomial 
and B(-) a constant. The corresponding X(-) in equation (47) is a con- 


stant and cancels out. Then equations (68) and (75) applied to equation 
(50) give 





—2K ass =, 
(1 — k’y’*)\(1 — ky’)? (76) 
Ne **) = G, + Ge'*. 


The constants G, and G, contribute to the two highest degree terms in 
the polynomial P(-). They need not be computed unless the specific 
polynomial is needed as well as the amplitude |e| of the approximation 
errors. 

The linear A(-) and constant B(-) determined by equation (75) can 
be used in equation (40) to approximate the truncation error for the 
polynomial approximation defined by equation (66). The corresponding 
error in equation (65) can be found by dividing by (1 — x/z,)'””. This 
gives 


Ee = 


= merle i oa tele, (77) 


eae 


ce eae ee ee < 4. Actually k < 1, but further analysis 
indicates that r will not be significantly >4 when 7” is small. 

When |x| > ©, W(a) > 1, Cas14./Crs+1 2 0, and e7 is dominated by a 
single Chebyshev polynomial (which has equal extrema). Consistent 
with this our y — 0, then G, , G. — 0 in equation (76) and r > 1 in 
equation (77). 

In equation (74), K,,,; can be determined by the formula for K, 
in equation (70). However, the following simpler approximate formula 
may be more useful: 

yet 
Kyai = 2(r)*(n ae 1)(n “fe 1/4)? (78) 
The error amounts to about 0.3 percent at n = 2 and about 0.04 percent 
at n = 6. The derivation is related to, but requires more than substitu- 
tion of Stirling’s approximation for the factorials in equation (70). 
_ Fig. 3 illustrates computed errors e(%) and e7(x) corresponding re- 
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Fig. 3 — Illustrating Example 1: (a) e(x) and (b) er(z). 


spectively to our approximately equal ripple solution and truncation 
of the Chebyshev polynomial series. The constants +e and re deter- 
mined by equations (74) and (75) are included for comparison. The 
computations started with 


t = 1.025, n= 10 
for which, as computed by equations (76), (77) and (78), 
y = 08, = 7/8, 
Kus, & 0.0006881, e = 0.0040678, r= 3.74. 
5.2 Example 2 
Let 
P(z) = (1 — 2/a%)'? +e(4), -l Sa +1 (79) 


in which the degree n of P(x) is again large and z, is again a given 
constant, |x| > 1. 

Since the function F(x) is the same as in Example 1, equation (75) is 
again valid. Now, however, W(x) = 1 and hence Section 4.1 (on un- 
weighted polynomial approximations) is appropriate. Applying equation 
(75) to equation (37) gives 

dd a ky?) K ne 


€ Perrin aca crane EY Scie a 2 


~ a +70 — #7) (80) 
Ne‘) =G 


in which N(-) contributes only to the highest degree term in P(-). The 
error ratio r turns out to be 
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5.3 Example 3* 
Let 
(1 — 2/2,)'’P(x) = 1 + e(z), -ls2rsi 


in which degree n of P(z) is again large and 2, is given, |x| > 1. 


In the equivalent weighted polynomial approximation 
Fa) = Gs = (1 — 2/my 
We) a 


Proceeding as in Example 1, one now gets 





i dl ag y’)* 
De’) = ——_", 
oe (1 — ye'*)} 
= a 1 +- 1) Kaas 
Co. + x? > — Koo 
2, Cnvinse (1 — ky*)(1 — keye'*)? 
eC 
my ont (n + 1)! [atm + 5/4)? 
pe Ys 1 ee re 
~ mt4 on + 4 


Then equation (49) gives 
— 2K ysl = ky’)! 


7 1—ky’* ? 
e7!*N(e7'*) = Nie* 
1 — ye'*’ 
vy, -£7@=")0 + Oke 
oe 1 — ky’ 
Equation (35) is now 

-ig i) 
me -ig a Ge" 

ye o=} 


(81) 


(82) 


(83) 


(84) 


(85) 


(86) 


The first 2n + 1 terms in this series contribute to P(-) per equation 


* The author has encountered this problem in connection with two different 


circuit theory studies, which will be described in other papers. 
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(36). The remainder can be summed, to get 


2Qn+1—i(2n+2) 
NGO eee ee 


6¢é"*) = (87) 


iy 


1 — ye" 
Evaluating the error corresponding to simple truncation now gives 
gts max | €r | ~ (1 os ly bye! = key’) (88) 

le] ~“Q-kly)a —- my) 


When 7 is large, k & 1 and r is so close to unity that the minimax 
refinement of simple truncation is not likely to be justified. However, 
our analysis has been useful in disclosing this fact, without the detailed 
computation of any minimax approximations. 


5.4 Example 4 
Previous work, which we shall discuss in Section VI, concerns the 
following problem: Let 
P,,(2) = P,4(x) + e(a), -1ls2¢s +1 (89) 


in which P,,,,(z) is a given polynomial of degree n + v and P,(x) is a 
disposable polynomial of degree n. For what P,(x) does e(x) have the 
equal ripple form? 

Equations (32), (83) and (37) now simplify to 


(a) vol 
De Curran(e™*) = Dy Crerenle™*) = BO), 


pees X¢"*) 
Xe”) 


(90) 





+ 6%), 


5 Xi) + BEX") = NE) 


in which B(-) is a polynomial of degree v — 1, with coefficients C,+14, 
and X(-) is a polynomial of degree v — 1, to be found therefrom. The 
coefficients of B(-) can be found by expanding the left side of the last 
equation and equating to zero the coefficients of pgsitive powers of 
exp (zy). The result can be expressed as the following specialization of 
equation (88): 


(c +£1)9 =0 (91) 


in which Q is again a column matrix whose elements are the v coefficients 
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of X(-) and I is the identity matrix of order v. The matrix C has the 
special form (assuming the elements g, of Q to be ordered per 


X(z) = YS a2’) 


CG Coa Tey Cee vy-] Ca. 
C = Case Cais ieee Cag 0 ? (92) 
Ca3- 90 oes 0 0 


When v = 1, X is a constant and the solution is elementary. When 
vy = 2, X(z) is linear. Let z, be its zero. Then equations (91) and (92) 
require 


é ++ 2C n+ 1€ — ACr +2 >= 0, 


a= a ° (93) 
n+2 
The roots of the quadratic equation in ¢ are real. When C,,, 4 0, the 
larger |e] > 2 | Caso] and |2,| > 1, as required. Then 6 in equation 
(36) turns out to be a power series in exp (—7yv) which can be summed 
to get 





2Qn+2 —i(2n+2)9 
ai Cans 2€ 


5(c'*) ae 


——; (94) 


1 — ye" 
> 1/2, ° 
When C,41 = 0, € = £2C,,. and z, = +1. But then the error due to 
simple truncation of the Chebyshev polynomial expansion of P,+2(x) 
is proportional to a single Chebyshev polynomial of degree n + 2, which 
has equal ripples with n + 3 extrema instead of n + 2. 
5.5 Lzample 5* 


As a last example consider the following nonalgebraic approximation: 
T(0) = DOA, sin o@ = 6 + £(6) (95) 
a=] 


—r<-@685 60685060. < ft. 
For what coefficients A, does the error e(@) have the equal ripple form 
and what is the amplitude e of the ripples? 


* This problem is of interest in, for example, the approximation of differentia- 
tion with a tapped delay line. A more detailed treatment is planned for a future 
paper. 
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A sequence of transformations changes equation (95) into a weighted 
polynomial approximation. First, equation (95) is equivalent to 


n=1 


T(0) = sin 6 >> B, cos (p8), (96) 


2A, = B,-1 — Bua - 


p 


Second, relate 6 to a new variable ¢ by 


sin$ = qsing, \o| <a; (97) 
. 8, 
q=sn 5 a ie 


Real » maps into —6@, S 6 S 6, . Also 
cos 6 = 1— q+ @ cos 2%, (98) 
sin 06 = 2q(1 — q’sin’ ¢)' sin ¢. 


In these terms, equation (96) becomes 
n-1 
T'(0) = 2q(1 — q’ sin’ v)' sing >> B, cos (2p¢) (99) 
p=0 


in which the set of coefficients B, is linearly related to the set B, of 
equation (96). In equivalent exponential terms 


f(6) = 71 — q’sin’® g)[PE*) — Pe”) (100) 


in which P(z) is a polynomial of degree 2n — 1 in odd powers of z only. 
Use equation (100) in equation (95) and solve for the factor in [ ]. 
The result can be expressed in terms of exponentials: 


Pe’) — P@'*) = F@'*) — Fe’) 
+ De'*)De"*)[Ee*) — E€**)] (101) 
in which F(-), D(-), and H(-) are related to previous functions by 


mt — qsin? g)? = > 2C,,-; sin 26 — le; (102) 


c=] 


00 
Re) Ss Cee, 
g=1] 


7 (1 — asin’)! = De'*)De™"*); 
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; 1 i2ey\ |} 
pei) = [mate] Iv} <4; 


E(0) = = [E(@*) — Ee}; 


E(e’*) = anaes 


It can be shown that the coefficients C.,_, obey a difference equation 
of order 2. The asymptotic behavior of the difference equation shows 
that 

20 — 1 


aS g ->@, Coox1 -(2=4) VO x4 s (103) 


As a result, for a sufficiently large n 


ioe) t(2n+1)¢ 
C (20-1) @ AY Con+1€ 
20-18 = 
g=nt+1 








1 + kre??? (104) 
ho (2st) a I. 
~ \Qn + 3/ — 4n 4+ 3 


Proceeding almost exactly as in Example 3, using equation (104) and 
the D(-) of equation (102), one can now obtain an approximation to the 
minimax error e, to the error e7(y) due to simply truncating the expan- 
sion of F'(e’*), and to the ratio r of max | e7 | and | «|. As in Example 3, 
it turns out that the minimax approximation is only a little better than 
the approximation by truncation, at least when 7 is large. 

Figure 4 compares a computed e;(v) with the approximate ¢ and 
ratio r of max | ey | to | « |, using 


6, = 170° n= 15, 

for which 
y = 0.8397 k = 0.96825, 
e = 5.690° r = 1.0840. 


VI. COMPARISON WITH OTHER WORK 


This section compares the present paper with previous publications 
in various related fields. It is not intended, however, to be an exhaustive 
survey of all related publications. 

The transformation from x to ¢ followed by distortion of the ¢ scale 
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TRUNCATION ERROR € 





TRANSFORMED @ SCALE ¢ 


Fig. 4 — Illustrating Example 5. 


to obtain an error function of the form «¢ cos [(n + 1)¢ + f(¢)] has been 
used before. Pertinent references are papers by Clenshaw’ and Stiefel,*** 
who call our f(¢) the “‘phase function”. Of these, Clenshaw’s paper is 
quite close to ours, and in fact our work might be regarded as a generali- 
zation of his. 

Clenshaw devotes much of his paper to the approximation of a 
polynomial of degree n -+ v with a polynomial of degree n, which is our 
Example 4. Clenshaw, (Ref. 2, pp. 80, 31) solves the problem for » = 2 
in a quite similar way, except that he expresses his Fourier series in 
terms of cosine functions instead of power series in e***. He exhibits an 
approximate solution which can be shown to be almost, but not quite 
equivalent to ours. We would have obtained an exact equivalent if we 
had restricted our polynomial P(-) to positive powers only, correspond- 
ing to c = 0 to n in equation (25) instead of —n to n. In equation (36), 
this restricts the disposable G,’s to \ = 1 to n + 1 and increases the 
number of terms in 6(-) to\ = n + 2to o. The result is a somewhat 
poorer, but frequently adequate, approximation to an equal ripple 
error. Clenshaw also notes how his approximation can be improved, but 
does not fill in the details. It can be shown that the improved approxima- 
tion would be an exact equivalent of ours. However, we have found that 
our formulations in terms of e**, instead of cos 9, are simpler, and also 
more revealing concerning, for example, the nature of the approxima- 
tions. 

Clenshaw (Ref. 2, pp. 31-36) also considers »v > 2, and obtains ap- 
proximate solutions for vy = 3, 4 in terms of roots of cubic and quartic 
equations. However, he retains the use of cos ¢, instead of e***. As a 
result, he does not include a formulation for a general v in terms of an 
eigenvalue and eigen vector of a matrix,-like our equation (91). 
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Clenshaw (Ref. 2, p. 29) notes that the equal ripple approximation to 
(a — x)~* has been solved exactly and cites Hornecker® and Rivlin’. 
Any solution to this problem is easily applied to more general problems 
in which the given function yields the same sort of remainder when the 
Chebyshev polynomial series is truncated. Examples are our Example 2 
and our general formulation for unweighted polynomial approximations 
with weight factor W = 1,m = 1, » = 0 in the remainder function (82). 

Our procedures are more general in the following ways: First, remain- 
der functions can have the more general form (32). For practical pur- 
poses, degrees m and yp should not be large. However, they need not be 
restricted to the special cases m = 0,» S 4andm = 1, p = 0. Second, 
minimax weighted errors can be obtained [by using suitable weight 
factors W(x) in equations (42), (44), and so on]. Third, unweighted mini- 
max approximations can be obtained with approximating functions of 
which the disposable polynomial is only a part (by solving an equivalent 
polynomial approximation with a weighted minimax error, as in our 
Examples 1, 3 and 5). Finally, relatively simple formulations have been 
obtained by using exponentials instead of cosine functions. 

Stiefel’s papers’’* have much less relevance to our work. They use 
the error formulation ¢ cos [(n + 1)¢ + f(¢)] but obtain solutions by 
numerical iteration. Reference 3 also includes a general integral equa- 
tion, which determines the required coefficients implicitly but is not 
easily solved. 

Our use of rational fractions to approximate remainder functions, 
as in equation (32), and so on, is at least reminscent of the so-called 
e algorithm. The e algorithm also uses rational function approximations 
to remainders but for a different purpose—to increase the rate of con- 
vergence when functions are evaluated from their power series. It is 
quite different from the use of rational functions in the formulation of 
minimax polynomial approximations. References for the ¢ algorithm are 
Shanks® and Wynn.° 

Our procedures require evaluating certain of the coefficients in the 
Chebyshev polynomial expansion of a given F(z) (or in the equivalent 
Fourier series expansion in terms of ¢). Various established numerical 
methods are available for this.’ The best choice depends on the form in 
which F(z) is specified (for example in closed analytic form, as a power 
series in x, or numerically at a set of discrete points). When F(z) satisfies 
a differential equation with polynomial coefficients, the coefficients in 
the Chebyshev polynomial series are related by a difference equation of 
finite order and can be computed recursively. Our Example 5 is a special 
case. The general relation is described by Clenshaw’° who also includes 
numerical tabulations of coefficients for some common functions. 
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The author’s 1952 paper on network synthesis in terms of Chebyshev 
polynomials is only remotely related to the present work.” 


VII. CONCLUSIONS 


Techniques like those described in Section IV and illustrated in Sec- 
tion V can be applied to many approximation problems in which the 
disposable part of the approximating function is a polynomial and 
approximately equal weighted or unweighted error extrema are desired. 
However, to be useful they must compete with other possible techniques, 
especially established numerical methods whereby equal-ripple ap- 
proximations are obtained by iterative improvement of a sequence of 
unequal-error approximations. This section notes some circumstances 
under which procedures like those described here may perhaps be pref- 
erable. 

First, the techniques described here are more likely to be competitive 
when the degree n of the disposable polynomial is large. When. n is large 
iterative numerical methods are more likely to entail excessive amounts 
of computing. On the other hand, certain aspects of the more analytic 
techniques described here are likely to become easier as n becomes 
larger. These concern particularly the use of a simple rational fraction 
to approximate a remainder function, as in equation (82). 

Second, the techniques described here are particularly suitable for 
exploring relationships between error amplitude | «|, the limits x, , 2, 
of the approximation interval, the degree n of the disposable polynomial, 
and other parameters in the approximating function (such as 2 in 
examples 1, 2, and 3). In explorations of this sort the computation of 
the actual coefficients of the disposable polynomial P(x) can usually 
be omitted. When 7 is large this can mean omitting most of the computa- 
tions required for a complete determination of the approximating func- 
tion. Frequently, computations which end with | e | remain very simple 
even though n becomes arbitrarily large. 

Third, sometimes, as in our Examples 1, 2, 3 and 5, our techniques 
give quite simple estimates of the advantage of an equal-ripple approxi- 
mation over simple truncation of an infinite series of Chebyshev poly- 
nomials. Such a comparison may be useful, for example, in deciding 
what sort of approximation should be computed in detail. 

More generally, an attractive combination may be an initial explora- 
tion in terms of the techniques described here, followed by the detailed 
computation of one or more preferred cases by established iterative 
numerical methods. 

We have assumed here that the parameters disposable for purposes 
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of approximation are all the coefficients in a polynomial. Preliminary 
investigation indicates that similar methods may be feasible for dis- 
posable rational functions, or ratios of polynomials, provided the poly- 
nomials in the denominators are of quite modest degree. This will be 
the subject of a later paper. 
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Multilevel Modulation Techniques 
for Millimeter Guided Waves 


By W. M. HUBBARD, G. D. MANDEVILLE, and J. E. GOELL 
(Manuscript received July 12, 1969) 


This paper describes an investigation of the feasibility of increasing the 
information transmission capacity of a guided millimeter wave communi- 
cation system by using quaternary and higher order modulation techniques 
in place of binary. It first presents a generalization of the binary system to 
higher orders and then extends the results of previously derived error-rate 
predictions. An experimental repeater for quaternary modulation which 
uses components that are similar to those used for binary modulation is 
then described along with associated equipment used for signal generation 
and performance evaluation. Finally, performance data on the repeater 
are given and compared with theory. 


I. INTRODUCTION 


The quaternary and higher-order modulation techniques which are 
described are extensions of the binary modulation technique previously 
discussed by the authors.’ The earlier system uses binary differentially- 
coherent phase-shift-keyed (B-FMDCPSK) modulation, a form of 
modulation in which the frequency of the carrier is increased or de- 
creased once during each time slot in such a manner that the phase 
(the time integral of the frequency shift) is changed by +90° relative 
to the phase of the previous time slot. Experimental repeaters were 
built which could regenerate, with a 10°° error rate, a signal which 
had been attenuated by an amount equivalent to 15 miles of wave- 
guide. In this repeater the incoming millimeter-wave signals from 
circular waveguide were passed through band- and channel-demulti- 
plexing filters, down-converted, amplified, differentially phase-detected, 
and regenerated (utilizing timing information self-contained in the 
signal) to obtain a polar baseband representation of the information. 
This baseband signal was used to drive an IF voltage-tuned oscillator, 
the output of which was amplified, up-converted, passed through 
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channel- and band-multiplexing filters and launched into circular 
waveguide. 

The modulation scheme of the system can be generalized in such a 
way that systems with MZ = 2” levels (quaternary, octonary, and so on) 
can be built with components most of which are the same as those used 
in the binary system. With such a system, the information capacity of 
the wave-guide can be increased by a factor m by using a 2” level signal 
in each of the individual channels. The increase in system capacity is 
accompanied by a decrease in immunity to noise and system degradation 
(which is common to all multilevel systems) and a slight increase in 
system complexity. 

A theoretical consideration of multilevel systems of all orders is given 
in Section IJ. This consideration amounts to an extension of previous 
error-rate calculations for binary systems.” Section III describes a 
quaternary (Q-F MDCPSK) system which has been built and operated 
at 320 megabits/s. 

Because the theoretical portion of this paper is a direct extension 
of a previous paper on a binary system it is assumed that the reader 
is familiar with the contents of that paper.” However, references to the 
binary paper are made where appropriate as an aid to the reader. 


II. THEORETICAL CONSIDERATION OF MULTILEVEL SYSTEMS 


2.1 Description of the Quaternary System 


The signal consists of a constant-amplitude angle-modulated carrier. 
The modulation is achieved by causing a frequency deviation once in 
each time slot. For the binary case the frequency deviation w() satis- 
fies the condition 


(n+4)T | 
[ awyav =o, (1) 
(n-4) T 

in the nth time slot, where a, = -+7/2 and contains the binary infor- 
mation. This signal can be written in the form 


a teee nt - i “o(t) a | (2) 


where w. is the center frequency about which the signal is deviated. 
Equations (1) and (2) hold for the quaternary signal as well. The only 
difference is that now a, can take on any of the four values --7/4,-+387/4. 

The signal space diagrams of these signals are shown in Fig. 1. The 
states marked “‘X” represent the phase states which are available to 
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Fig. 1—Signal space diagrams for DC phase-shift-keyed signals: (a) binary, 
(b) quaternary. X represents phase states available in even number time slots 
and “O” represents the phase states available in odd numbered time slots. 


the signal in the even numbered time slots and those marked “O” 
represent the states which are available in odd numbered time slots. 
Thus the transition always takes place from a state marked X to a 
state marked © or vice versa. 

Bennett and Davey describe the detection scheme for DCPSK 
modulation.® A particular embodiment of the differential phase detector 
for the binary signal of Fig. la is shown in Fig. 2. Here the relative 
delay between the two paths is T where 7 satisfies simultaneously 
the two constraints 


2 = Baud rate 


oj 


wel’ = (n + $)r nan integer. 


Figure 3 illustrates how this differential phase detection concept is 
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Fig. 2— Binary differential phase detector. 
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Fig. 3— Quaternary differential phase detector. 
extended to the quaternary case. In this case the preceding constraints 
become 
] iL 
—- ~ — ~& Baud rate 
T; 2 
wl’, = (m + 9) 


Wol’s = Not 


N,,N. integers. 


If one applies the signal described by equations (1) and (2) to the 
device illustrated in Fig. 3, he finds that the outputs V, and V~ are 
given by Table I for the four possible phase changes, a. Thus the binary 
variable V, defines the sign of a while V. defines its magnitude. Stated 
another way, given that the phase state in the (n — 1)th time slot 
was at 0 radians in Fig. 1b, V, determines in which half plane (upper 
or lower) the phase state of the nth time slot lies while V. determines 
in which half plane (left or right) it lies. 

Since each branch of the quaternary differential phase detector is 
identical to the binary device described in Ref. 2 the limiter and the 
regenerators can also be identical to those described in Sections III 
and 1.4 of Ref. 2. 

For the multilevel system, only one device which is not a direct 
adaptation of an existing component of the binary system is required. 
Its function is to translate the regenerated binary signals into a signal 


TasLe [—Ovrpruts For Four PossiIBLE PHASE CHANGES 


a Vi V2 
w/4 1 1 
—7/4 —1 1 
37/4 1 —1 


—37/4 —1 —1 
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suitable for driving the FM deviator. Its performance should be virtually 
error free and need not be considered in the error-rate calculations. 
A description of this translator is deferred to Section III. 


2.2 Hxtension of the Error-Rate Calculation 


The bit-error probability of the quaternary system described in 
Section 2.1 is equal to the probability that an error is made in one of 
the baseband binary sub-channels. This, however, is just the probability 
that an error is made in a binary differentially-coherent phase-shift- 
keyed system in which the expectation value of the second pulse is 
shifted an amount 7/4 from where it should be for binary operation. 
(This phase shift from the desired binary-system value is represented by 
the quantity 8 in Section III of Ref. 2.) The probability of bit error, II 
can be restated for the quaternary case as* 


n= 3Pi(o+ 5+ 5) + 4P,( ~t.4) (3) 


where 


Loft | cos’ ® | 
PRO =o a Pot Guam ein ol o (4) 


An approximate solution (in closed form) to this integral is derived in 
the Appendix. Here the quantities ¢ and 6 have the same meaning as in 
Reference 2, namely, 6 is the phase shift due to intersymbol interference 
and any other phase distortion in the system, and ¢ = sin™’ e where 
eis given by S/T = —10 log ef and S/T is the signal-to-threshold ratio 
of the regenerator in decibels. S/T is defined in Section 1.4 of Ref. 1 
as the ratio of the expected value of signal power to the minimum value 
of signal power which will cause the regenerator to function realiably 
(in the absence of noise). 

Values of P, for & from 0 to 30° are given in Ref. 2 for signal-to- 
noise ratios S/N = 9 through 15 dB. These results are extended in 
Fig. 4 to include values suitable for quaternary and higher level systems. 
Note that P.(&) is even. Figure 5 shows error rate as a function of S/N 
for an ideal quaternary system. 

The effects of finite S/T and 6 are more pronounced for quaternary 
systems than for binary. The threshold effect noise figure V7 defined in 
Ref. 4 is shown in Fig. 6 for a quaternary system and for a binary 

* Equations (3) and (4) follow directly from Equations (20) and (21) of Ref- 
erence 2 by replacing 6 with 5 + 7/4. 

t The equation relating S/T’ and e in Ref. 2 is incorrect. The conclusions of Ref. 


2 are not affected by this as the correct form of the relation was used in the cal- 
culations. 
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Fig. 4 — Error integral versus ® for various values of S/N. 


system. VN, is the amount by which S/N must be improved in order to 
offset the effect of a nonideal regenerator. 

The effects of 6 and S/T are determined directly from Fig. 7 which 
shows the value of S/N required for 10-°, 10°, 10°°, and 10°* error- 
rate as a function of ®. For comparison we consider the values of S/T 
and 6 which were inferred from the measurements made on the binary 
repeater described in Ref. 1, namely S/T = 10 dB, 6 = 10°. For the 
binary case the combined effect of these degradations is about 0.8 dB 
whereas for the quaternary case the combined effect is about 3.4 dB. 
The theoretically predicted values* of S/N for operation with an error 
probability of 107° are therefore 13.8 dB and 21.3 dB for binary and 
quaternary, respectively (with half the bandwidth requirement in a 
quaternary system with the same bit rate). (The experimentally de- 
termined value for the binary system is 13.7 dB.)’ 


2.3 Extension to Higher Level Systems 


In a system with 2” levels, the signal described in Section 2.1 could 
be generalized to have 2” equally spaced positions around the unit 
* This value docs not include degradation introduced into the quaternary sys- 


tem due to nonlinearity of the FM deviator. Unlike the binary case, this non- 
linearity is important in the quaternary and higher order cases. 
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ig. 5 — Probability of error in ideal binary and quaternary systems. 
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Fig. 6 — Threshold effect noise figures as a function of signal-to-threshold ratio, 
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Fig. 7— Required signal-to-noise ratio for various values of error probability 
as a function of ®. 


circle in the even numbered time slots. The differential phase detector 
would then require m of the binary differential phase detectors of Fig. 2, 
with delay lines 7, , T. --- 7, such that the values of w)7', are chosen 
to give 2” equally spaced values around the unit circle beginning at 
1/2”, that is, 


af, = Ee ea | an ence ree 


In systems with more than four levels this method of detection 
gives more than one level of pulse height at the regenerator. The follow- 
ing consideration makes the approximation that the worst-case error- 
rate applies in all cases. This results in a calculated error-rate which is 
too large by a factor which is less than the ratio M/(M — 4) for an M 
level system (M > 4). 

Equation (3) then becomes: 


T us 1 —_ sit Se 
n= 3,(6+5+2-2) +aP,(o 6 £42) (5) 


and equation (4) is unchanged. Figure 4 gives values of P)(#) suitable 
for evaluating II for values of / up to 16. Figure 7 indicates the values 
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of S/N required for 107°, 107°, 10-° and 10~* error-rate for ideal 2” 
level systems for m = 1 through 5. 


2.4 Results of the Calculations 


In an ideal repeater the signal-to-noise ratio for an error rate of 
10°° is 13.0 and 17.9 dB for binary and quaternary signal, respectively. 
This amounts to a price of 4.9 dB for the doubling of the bit rate (or 
alternatively the halving of the bandwidth) achieved by quaternary 
systems. In an actual repeater with intersymbol interference and non- 
ideal regeneration comparable to that in the 306 megabits/s binary 
repeater of Ref. 1 there is an additional degradation of about 3.4 dB 
(compared with 0.8 dB for binary signals). Thus a signal-to-noise ratio 
of 21.3 dB is expected to be necessary for a 10° error rate in the quater- 
nary repeater—a penalty of 7.5 dB compared with the binary repeater. 
For a guided-wave system of the sort described in Ref. 1 this requires 
only a 2.5 mile decrease in repeater spacing (about 10 to 15 percent) 
which might not be an unattractive price for doubling the channel 
capacity of the system. 

For systems with more than four levels, the degradation in error-rate 
performance due to S/T and 6 is even more severe. For eight levels 
(m = 8) for example, & becomes 83.2° (for the worst case) for S/T = 10 
dB and 6 = 10° and the degradation is (from Fig. 4) intolerable. Clearly 
an improvement in S/7' and a substantial improvement in 6 is necessary 
in order to make systems with more than four levels feasible. Even for 
ideal systems (S/7' = «, 6 = 0), signal-to-noise ratios of about 23.7 dB 
and 29.7 dB are required for 10~° error-rate for eight and sixteen level 
systems respectively compared with 13.0 dB and 17.9 dB for two and 
four level systems. 


III. EXPERIMENTAL RESULTS FOR QUATERNARY REPEATERS 


3.1 Description of the Experimental Repeater 


The quaternary experimental system, shown in Fig. 8 is similar to 
the binary system described in Ref. 1 with the following exceptions. 
(t) Where one differential phase detector and one regenerator were 
used in the binary, two are required. In addition, a binary device called 
a translator is needed. 
(iz) Conversion into and out of the millimeter medium was omitted. 
(iii) Modulation of a deviator by the regenerated baseband signal 
was not attempted.* 
* The technique for doing this, however, is identical to the technique used to 


synthesize the signal from the random word generator and should present no addi- 
tional problems. 
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Fig. 8— Block diagram of quaternary experiment. (a) Transmitter (word gen- 
erator and IF phase modulator) and (b) Receiver (IF amplifier; limiter, phase 
detector, and regenerator). 
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(tv) A symbol (baud) rate of 160 MHz (820 megabits/s) was used. 


In a quaternary FM phase-shift-keyed (Q-FMDCPSK) signal the 
information is contained in the phase shift between adjacent time 
slots. The phase shifts used are +37/4 and 7/4. The signal is generated 
by a voltage-controlled oscillator whose frequency is pulsed between 
sampling instants so that the time integral of the frequency shift is 
equal to the desired phase shift. An illustration of such a signal is 
shown in Fig. 9. 

A four-level baseband pulse train is derived from a two-level polar 
source identical to the one used for the binary experiment’ except for 
the rate which in the quaternary experiment was chosen to be 160 
MHz. The random binary output signal is divided into two signals. 
One of the signals is delayed a few integral time slots and the other is 
attenuated 6 dB. They are then recombined. The combination of 
+2, —2 with +1, —1 pulses produces a pulse of one of four levels in 
each time slot (see Fig. 10a). These are +3, +1, —1, and —8, which 
produce phase shifts in the deviator 37/4, 7/4, —7/4, and —37/4, 
respectively. By delaying one signal a few time intervals, we closely 
approximate the effect of two independently random binary signals and 
thus obtain a virtually random four-level signal. Observation of the 
spectrum displayed by a spectrum analyzer verifies the randomness 
(see Fig. 10d). 

Identifying the larger binary component as signal #1 and the smaller 
as signal #2 will help clarify the explanation of the regeneration process 
which follows. For the binary system, the signal was detected by a 
differential phase detector, the output of which is given by 

cos [p(¢) — ot — 7) + aor] 

where ¢ is the phase angle of the signal, 7 is the time delay introduced 
by a delay line built into the device, and w, is the angular carrier fre- 
quency. For the binary case, r was made equal to the bit interval and wor 
equal to 7/2 + nz. The operation of the differential phase detector is 
illustrated in Fig. 11. The reference phase is taken as the phase of the 
signal in the previous time slot. The information given by the device is 
the projection FZ, of the signal S along the vertical axis. The two phase 
transitions of the binary case, --7/2, are fully determined by the sign 
of HE, . 

For a quaternary signal, it is not possible to distinguish between 
transition into regions I and II or those into III and IV using only £, . 
This problem was solved by splitting the IF signal into two portions, 
one of which was connected to a differential phase detector with r = T, 
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Fig. 9 — Representative quaternary frequency-modulation differentially-coherent 
phase-shift-keyed signal. 
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Fig. 10— (a) Random four-level base-band pulses (H = 2 ns/cm); (b) sym- 
metric eye (H = 2 ns/cm); (c) asymmetric eye (H = 2 ns/cm); and (d) IF 
spectrum (H = 30 MHz/cm). 
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( REFERENCE) 





Fig. 11— Operation of the differential phase detector. 


and the other was connected to a differential phase detector with 
7 = T,. The quantities w, , T; , and JT. were chosen so that 


WoT = 16.57 and Wo T's = 167 


in order to satisfy the conditions set forth in Section II. Thus the output 
of the first differential phase detector gives Z, , and the output of the 
second /, . From the signs of #, and H, the quadrant of the signal can 
be determined as shown in Table II. Figures 10(b) and (c) are photo- 
graphs of sampling oscilloscope displays of the two phase-detector out- 
puts of a typical random signal. Because signal state #2 does not corre- 
spond to either H, or H,, a translator is required. If #,, and FZ, are regen- 
erated so that they have unit magnitude, then signal #1 is given by EF, 
and signal #2 is given by the negative of the product, #,H, . The sign of 
the product H,E, was determined by the translator circuit shown in Fig. 
12. This circuit is similar to the balanced-line logic element used in the 
binary regenerators except for the input circuit; it functions as follows. 
Without input, once each time slot either diode A or diode B must 
switch into its high-voltage state while the other diode remains in its 
low-voltage state. Applying positive bias causes diode B to switch 
giving positive output pulses while applying negative bias produces the 
opposite effect. For the translator circuit the bias is set so that diode B 
always switches into the high-voltage state in the absence of input. The 
input to the circuit is the sum of the regenerated outputs of the two 
differential phase detectors H’, and E,, . If the sum of the two signals is 
zero, diode B switches so the output pulse is positive. If the sum of 
the two signals is either positive or negative diode A switches and a 
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TABLE I] — DETERMINATION OF THE QUADRANT OF THE SIGNAL 





Phase Shift Original Original 
Quadrant at Transmitter E, En Signal #1 Signal #2 
I «/4 “+ ze a = 
II 37/4 + — + + 
III —37/4 — _ — — 
IV —7/4 — + ~— + 


negative output results, since for either polarity, current flowing through 
the steering diodes, D, and D, , increases the current through diode B. 
Thus the translator output is equivalent to the product of the re- 
generated LH, and E, signals. 


3.2 Results 


The error rate was measured as in the binary experiment, except 
that the two binary components of the regenerated quaternary base- 
band signal were compared separately with their properly delayed 
_ counterparts comprising the input baseband signal. 

The results of the error-rate measurements are shown in Fig. 13. 
Curve 1 shows the error-rate performance which was obtained when 
the equipment was originally built. At high error-rates (above about 
5 X 10-°) the performance was in good agreement with the theoretical 





160 MHZ o 
DELAY LINE O+ 
CS 6) : BIAS 
YY CONTROL 
INPUT OUTPUT 
DELAY LINE O- 
O 
160 MHZ = 
(180°) 


Fig. 12 —Schematic of translator circuit. 
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Fig. 13 — Results of error-rate measurements, 


predictions. There was, however, a “floor” at an error-rate of about 
4 X 107’ and no increase in the S/N would reduce the error-rate below 
this value. It should be noted that the degradations considered in 
Section II shift the error-rate line to larger S/N and decrease the slope 
slightly Gn magnitude) but do not tend to establish a “floor” such as 
the one shown by curve 1. It might also be mentioned that a ‘‘floor”’ 
similar to this is characteristic of experimental error-rate curves for 
the binary repeater of Ref. 1, but there it occurred typically at error- 
rates below 107*° and was therefore considered insignificant. 

Thus the indication was that the “floor” was the result of some 
degradation which was neglected in the theory and to which the four- 
level system was far more sensitive than was the two-level system. 
One possibility was that the impairment was due to slight mismatches 
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among the commercial components. Improved performance was achieved 
by careful selection of components with particular emphasis on the 
linearity of the deviator. Curves 2 and 3 of Fig. 13 show the typical and 
best performance, respectively. The “floor” was at 8 X 107’? when 
curve 3 was observed. Thus the agreement between theory and experi- 
ment is fairly good at error-rates substantially above the experimentally 
observed floor. 


3.3 Conclusions 


The quaternary experiment was restricted to the investigation of 
the modulation and regeneration aspects of a repeater system. How- 
ever, the data obtained, coupled with the results from the previous 
binary experiment, suggest that a quaternary (Q-FMDCPSK) re- 
peater system is feasible and might have applications where the con- 
servation of bandwidth is desirable and the cost in terms of noise 
immunity can be afforded. Systems with eight or more levels do not 
seem feasible at the present time. 


APPENDIX 


Livaluation of the Error-Rate Integral 


The integral P)(&) [equation (4)] or its equivalent 
dade / . / + a re ee ee 
Po=ar jf xp l-@ — py — Y — BW] 


LGe 1 YQ 
-erfe E 7 4g | dx dy (6) 
is frequently encountered in error-rate calculations for digital phase 
and frequency modulated signals.’’?'**? The equivalence of these two 
forms can be shown by arguments similar to those of Ref. 5. Namely, 
the integral in the form of equation (6) is written in polar coordinates 
and the integration over r is performed to give 








2 
P, = a exp (—p’ sin’ ¢) erfc [¢ cos @ + 7)] 


3 
oF exp (—p’ cos’ ¢) + p cos¢ a erfe (—p cos | do 
where 


p=(p+p), y= ain ca ain q=(¢@t+ @). 
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This integral can be greatly simplified by writing the error function 
compliments in terms of their respective error functions and making 
use of the odd parity of the error function. When this is done the only 
nonvanishing terms are (after some simplification) 

1 1? (p/a) [(p?—z?)! cos y-z sin ¥] 
Pyp==- - | a exp (—2 — y’) dxdy. (7) 
Do? ae ite, take 
The limits of integration in equation (7) form half of an ellipse. Because 
of the spherical symmetry of the integrand we can pick any half of this 
ellipse. Thus the integral can be written 


1 x/2 R(¢) 3 
Py = a If exp (—71')r dr do 
where (after some simplification) one finds 


S cos’ ® ; 
R@) = F + sin @sin a 


and 





c cos 
s=P Ea cos b= 2p 


Performing the radial integration gives 


-i[" ( S cos’ & ) 
P,(®) = Te ad 1 + sin @sin 0 ae 
which is equation (4) with the substitution 


i 
eee 
As a first step in finding an approximate solution to P)(@) we can 
write 


a/2 
P(®) = = exp [—S cos’ &(1 — sin @sin 4)] dé. 
2r ~—1/2 
But P(#) can be integrated to give 
P(&) = 4 exp (—S cos’ &)J,(S cos” ® sin ®) 


where J, is the modified Bessel Function of the first kind. Now P(®) 
must be a good approximation to P)(&) for sufficiently small ®. In fact, 
for 6 = Oand 6 = +7/2 we have 


PQ) = Po(0) = 3 exp (—S) 
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and 


rag) ~ nag) ~a 
respectively. Thus P(@) and P)(¢) agree at both extremes. 
Let 3(P) be the integrand of the integral for P and J(P,) the integrand 
of that for P, . Their ratio is 
_ g(P) {8 sin? (26) sin? ‘\ 
a S(Po) XP \4 "1+ sin @sin 0 


We observe that R = 1 everywhere. In the following argument we 
assume for clarity that @ > 0; but it can easily be verified that similar 
arguments can be constructed for the case @ < 0 and the same results 
obtained. 

R(, 6) has an absolute maximum Ry; at 6 = —7/2 and a relative 
maximum Rey, at @ = 1/2 

Rice e. sin’ 2 \ 

SE = lain 


Pits Saas 8. sin” 2 \ 
he SEP a Te (in | 


Thus we have rigorous bounds 
P(®) 2 Po(#) 2 P(®)/Rax - 


Unfortunately these bounds are often too loose to be of much value. 

Inspection of the integral for Py reveals, however, that almost none 
of the contribution to the integral comes from the region near @ = —7/2 
and in fact almost all of the contribution comes from the region near 
6 = 7/2 where R(®, 0) % Rea . This suggests that we consider the 
expression 


P,(®) = P(®)/Reu 


5 exp \-s (1 — sin s)} I,(S cos’ & sin &) 


as an approximation to P,(&). Note that P,(#) also possesses the 
property that 


P,(0) = P.O),  Pi(aa/2) = Po(-k7/2). 
Finally, one finds empirically that a somewhat better approximation 
is given by 
P.(%) = P,(®)[1 + 3 | sin (26) |]. 
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It has been verified by numerical calculations of P, that for values 
of S and ¢ which give 107"* < P, < 107° the accuracy of the approxi- 
mation is better than 10 percent for € < 50° and remains better than 
36 percent up to @ = 80°. In this range of error rates a variation of 
40 percent in P corresponds to a variation of a few tenths of a decible 
in S/N. This form is most useful for binary systems (® small) even 
though it is valid for a wide range of values of ®; a simpler form is 
derived below which is valid for large @ and is therefore more con- 
venient for quaternary and higher order systems. 

It has been pointed out by H. O. Pollak that the integral 


oe 
i exp (—2)I,(a sin ®) dx 
Ss 
which arises from consideration of a sine wave plus random noise’ is 


closely related to the error-rate integral P)(#). Pollak’s proof is sketched 
below. 











Set 
p= Scos , a=sin®, 
—] 1 a 
VT ie Tega 8 Ta 
Then equation (4) becomes (after some simplification) 
a 2b 
oo) 
P. = L+o/ exp (—py) dy 
°  2x(1 — a”)? 


ca oe — 4/2)3 
But’ 


26 
exp(—py) , _ Ee 
[ Ope pe = bp) I (bp) 


from which it can readily be shown that 
Po = we / exp (—2)I)(x sin ®) dz. 
s 


From this form of the error-rate integral an approximate solution 
can be derived when S sin @ is sufficiently large to expand the Bessel 
Function as* 


* For |z| > 0.15 this approximation is valid to within 17 percent. 
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TaBLE IIJ—-S/N INcrEASE FoR VARIOUS NUMBERS OF LEVELS 


Approximate increase in S/N in dB 
from the binary coherent phase- 


Number of levels shift-keyed case for same error-rate 
2 0.5 
4 5.3 
8 11.2 
16 17.2 
32 23 .2 
and so on and so on 





Making this substitution and performing the integration gives 


: | 2+ Lose #1] erfe {[(1 — |sin @ |) S]}}. 


When both | sin | > 0.15 and | sin | > 0.15/S hold, this can be 
written, to an accuracy of a factor of two, as simply 
P, = derfe {[(1 — | sin & |) S*}. 


For cases of interest for quaternary and higher order systems these 
constraints are well satisfied and this approximation is very good. 
For the M-level case with 6 = ¢ = 0, (no phase distortion) 


erfe HE — gin é = x) s\ | 
at {[sGw s)5]} 


For large M this becomes 


P. = 


II = 


wie 


bol 


II = 3 erfe (= st). 


Thus, the S/N must be increased by 6 dB each time the number of 
levels is doubled if the error-rate is to remain constant. This is illus- 
trated in Table III. 
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Crosstalk in Multiple-Beam Waveguides 


By DETLEF GLOGE 


(Manuscript received August 6, 1969) 


Crosstalk limits the number of communication channels which are 
spatially resolvable at the end of a beam waveguide. The main sources of 
crosstalk are scattering and distortion by the focusers. A careful study of 
high quality front surface mirrors led to the results of this paper. The best 
choice seems to be a periscopic guide made of dielectric mirrors when used 
with gaussian beams in a particular mode of multiple transmission. We 
give a closed description for the expected power profile of a gaussian beam 
that has passed such a guide and an approximate formula for the mutual 
crosstalk between several such beams. 


I. INTRODUCTION 


Arranging many optical channels spatially resolved in the same 
waveguide is a simple means for high capacity transmission.*’? All 
channels can be modulated in the same frequency band as long as the 
crosstalk is kept below a certain threshold. One source of crosstalk 
is the inevitable scattering from the focusing and directing elements.° 
Diffraction from the ideal beams is negligible.* Yet we shall see that 
diffraction must be considered once the beams are distorted by the 
focusers. 

In all likelihood, these focusers would consist of mirrors because 
lenses of the size needed are apt to have imperfections within the ma- 
terial. The scattering characteristics of high quality front surface mirrors 
and lenses of the best quality have been measured recently.’’® A com- 
parison shows that the lens scattering was about one order of magnitude 
larger. Directional changes can easily be accomplished by using peri- 
scopic mirror arrangements of the kind shown in Fig. 1.’ Neglecting 
aberrations, we consider only the first order focusing effect of these 
periscopes, which is that of thin convex lenses. 

Two methods of multiple channel transmission have been suggested.” 
We discuss these two basic methods with respect to their susceptibility 
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Fig. 1— Sketch of a periscopic waveguide. 


to crosstalk when mirrors of the kind measured in Ref. 5 are used in 
the waveguide. 


II. MULTIPLE CHANNEL TRANSMISSION 


The useful cross section of the beam guide is limited by the size of 
the periscopic mirrors. Without unreasonable effort, mirrors of good 
optical quality can be made 20 to 30 cm in diameter at the most. The 
bundle of beams must clear the mirror edges by a wide margin at all 
times to guarantee safe operation. This implies that diffraction cross- 
talk caused by the mirror edges is negligible. Tolerances which would 
allow for controlled diffraction, as suggested by Ref. 4, seem unreason- 
ably tight. Thus we arrive at a radius & of about 10 cm for the useful 
cross section. The spacing D of the focusers is limited by the terrain 
and the cost of the straight sections in between. It will most likely be of 
the order of 100 m. For optimum conditions, the effective focal length 
of the focusers should be half their spacing, although some deviation 
can be tolerated in this respect. 

Consider the waveguide as a periodic lens system which images an 
array of transmitters into a similar array of detectors. This is basically 
what the imaging method (in Fig. 2a) does. Diffraction effects can be 
minimized if every transmitter radiates a coherent gaussian beam. As 
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these beams propagate in the guide, their sizes change periodically from 
the fairly small transmitter spot size to a size close to the cross section 
of the guide. 

This periodic change is avoided by the grouping method sketched in 
Fig. 2b. The beams arrange in groups and open up to the fundamental 


mode radius 
w = (24) (1) 


T 


before they enter the guide. They keep very close to this radius through- 
out the transmission. Figure 2b shows the grouping method for two 
groups containing two beams each. Special collector lenses single out 
the groups at the end and focus the beams well separated on the de- 
tector array. For a better understanding of this detection system, 
consider the focal length f of the collector lenses to be short compared 
to the distance D between a collector lens and the preceding focuser. 
In the plane of this focuser, all groups of beams form overlapping 
patterns. Every collector lens selects the pattern of its group and images 
it into the detector plane scaled down by a factor {/D. Consequently, 
the detector array of every group is confined to a circular area with a 
radius Rf/D. 

The density of resolvable beams in the system is determined by beam 
distortion and scattering rather than the spread of the ideal beams. 
The distortion of the beam profile determines the receiver size required 





Fig. 2—Schemes for spatially resolved transmission (a) the imaging method 
(b) the grouping method. 
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to secure safe reception. One would like to make the detectors as small 
as possible in order to minimize the scattering collected from adjacent 
channels. For further reduction of the crosstalk, one has to increase 
the spacing between the detectors. 

Let us assume that the center-to-center spacing is s for the collector 
lenses and o for the detectors in a group. In this case we would have 
«R*/s’ groups and rR’f?/D*c” beams per group if the guide were per- 
fectly confocal. If we allow a slight tolerance for the spacing of the 
focusers, Ref. 2 shows that the groups belonging to off-center collector 
lenses cannot be completely filled. For this reason the total number of 
channels is only half the theoretical maximum, namely 

29442 
= 3 Fed 2) 


Rather than considering the detector plane, let us look at the distri- 
bution every group has at the focuser preceding a collector lens. This 
way our results become independent of the focal length f of the collector 
lenses and a function of the beam waveguide only. In the plane of the 
last focuser the beam spacing is Do/f. As the beams have the same 
width there as at the collector lenses, it seems reasonable to set 


s = Do/f. (3) 
Inserting this into equation (2) yields 
7 Ps (ey 
NS ots) (4) 


In the following let us assume that the detectors are simple quantum 
counters, have a circular area, and have a radius df/D. Transforming 
this back to the last focuser, we find a circular area of radius d suscepti- 
ble to crosstalk around every beam. In Section II we evaluate scattering 
and distortion of a single beam in the plane of the last focuser for the 
case of a waveguide of n mirror periscopes. Since we use direct de- 
tection, we may neglect phase front distortion. The case of heterodyne 
reception is briefly discussed in the appendix. The results are very 
similar to those of the direct detectors. 


II. DISTORTION AND SCATTERING 


Both distortion and scattering are a consequence of irregularities 
on the mirror surfaces. The distortion originates from smooth imper- 
fections extending over areas comparable to the beam cross section 


CROSSTALK 59 


while scattering is caused by a surface roughness correlated over dis- 
tances much smaller than the beam diameter. Both irregularities are 
part of a statistical function 6(z, y) which describes the deviation from 
the ideal surface. Taking a meaningful average over an ensemble of 
test surfaces leads to the structure function 


A(p) = ([d(z, y) — 6(@ — p cosa, y — psina)]’)sy (5) 


where p and a@ belong to a polar coordinate system which has the point 
(x, y) as its origin. Writing A as a function of p only implies the assump- 
tion that 6 is stationary and isotropic, which seems justified for the 
statistical properties involved.’ 

A light wave of wavelength reflected off the imperfect surface 
suffers a phase front distortion 


ote, 0) = = a(x, v). ) 


We neglect reflection loss which we assume to be uniform over the 
surface. For gaussian statistics* 


(exp a[y(z, y) — o(@ — p cosa, y — psina)]). = exp | -8¢ ace) | 


(7) 


This equality will be used to calculate the power distribution p,(r) at a 
distance D from the reflecting surface. Assume that the reflected beam is 
circular, symmetric, and would have a power profile po(7) at a distance 
D if the reflection were ideal. Then, from Ref. 8, one obtains 


gi(p) = go(p) exp | -8 ace) | (8) 


where g:(p) and go(p) are the Hankel transforms of p,(r) and p,(7), 
respectively. This Hankel transformation is defined by 


(0) = 2 | pilt)So@npr/Dr)r ar (9) 
or 
Qa = 
pilt) = sR | gulebJoC2ner/Dr)p dp (10) 


where J 1s the Bessel function of zero order. The quantity p,(r) has to be 
understood as an average over an ensemble of equivalent surfaces. 
For an accurate confocal spacing of the periscopes, a beam and its 
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distortion in the guide reproduces itself at every second periscope. 
These periscopes only contribute to the phase front distortion in the 
detector plane, while all odd ones deteriorate the power profile as well. 
We have n periscopes with 2n surfaces, half of them contributing to 
the profile distortion. Since the imperfections of all surfaces are un- 
correlated, we may write 


Gn(P) = go(p) exp | 8 ao | (11) 


for the detector plane. 

In a guide with thousands of focusers, accurate confocal spacing 
requires tight tolerances for the focal lengths and spacings. In a practical 
guide, the focusers will be kept only nearly confocal and, in general, 
will not be at positions at which previous distortions are reproduced. 
If all positions are equally probable along the guide, equation (11) can 
be adapted in the following way® 


8r 2n 


g(p) = go(p) exp | 8 20 f° A(p sin &) ae]. (12) 


Notice that for § = 0 or z, we have A(O) = O and no change of the 
power distribution, while for § = 7/2 the profile distortion is a maximum. 
A fairly reliable functional approximation for A in the range p = 0.01 
to 1 mm was derived from scattering measurements around a test 
beam.” The scattering is an effect of the mirror surface roughness 
averaged over the area covered by the test beam. This average is 
equivalent to an average over an ensemble of test surfaces. As a conse- 
quence, the variance of the scattered power is small, that is, the scat- 
tered power actually measured is very close to the average power. The 
measurements were practically the same for all test surfaces. 

This is not true for the processes involved in beam distortion. In 
this case, the 6-components participating are correlated over areas 
comparable to the test beam, no averaging is accomplished by the 
measurement, and the result can be grossly different from one surface 
to the next. It is this difference between scattering and distortion 
which makes scattering measurements feasible but distortion measure- 
ments tedious and expensive. Distortion is not sufficiently described 
by an average power profile; instead one needs to know the complete 
probability distribution of the power at every point in the beam cross 
section. In this situation some grossly simplifying assumptions are 
necessary to tackle the distortion problem with the scant experimental 
evidence available. 
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We derived the functional approximation 
A(p) = Ep for p= 0.01 --- 1mm (13) 
with 
E = 2.4 X 107% (14) 


from measurements.’ This approximation is plotted in Fig. 3. For 
larger p we have only one reference point: the quality factor of the 
mirror, given in fractions of the green wavelength, which specifies 
6-components correlated over areas comparable to the polishing tool. 
This will be typically several centimeters. We know A decreases for 
smaller p and merges into the linear function (13) at p = 1mm. As a 
convenient approximation, let us assume that A is linear everywhere. 
This function would correspond to about d,;ecn/50 at several em. 

Whether equation (18) is a good approximation for p < 0.01 mm 
we do not know, but this is of little interest, since components at 
these small 6 generate scattering which does not reach the next focuser, 
but is absorbed by the guide wall. 

A gaussian beam of unit power has the profile 


polt) = <5 exp (—21"/w?). (15) 
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Fig. 3— The linear structure function A(p) calculated from scattering profile 
and quality tests. 
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Its Hankel transform is 
go(p) = exp (—1'w"p /2D’)’). (16) 


Using equations (10), (12), (13), and (16), we find the following average 
power distribution at the detector 





= 2," exp (24), 2) 
o(r) = wut I, exp e 5 J ae u du (17) 
where 

TW wk D 

w= P and a= 64 aa (18) 

The average power falling outside a circular area of radius z is 

P@) = an | p(r)r dr. (19) 
Using the identity 

J,(2)z = [ Jo(vyv dv, (20) 

0 


we arrive at 
_ 22 f” w+ au) (2) 
Pg) =1- 7 [ exp (ob Jy a du. (21) 


The result of the machine evaluation of equation (21) is plotted in 
Fig. 4. For a = 0 the beam is undistorted and P(z) is gaussian. Yet 
P(z) has a tail decreasing with 1/z for finite a. 

In the course of our calculation we want to know the radius z out- 
side of which a given power P can be found for a certain parameter a. 
For this purpose the function 2(P, a) is plotted in Fig. 5. It can be 
approximated by the expression 


Z E In 3; + (16H) (2 — 1) | (22) 


where equation (18) was inserted for a. 

The first part of equation (22) is an inverse gaussian and depicts 
the coherent beam, while the second part accounts for the incoherent 
portion. Equation (22) can be used, for example, to calculate the 
detector radius required at the end of a periscopic guide. In this case 
one will probably allow the second term in equation (22) to be about 
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Fig. 4— The power fraction P expected outside the circular area with the nor- 
malized radius z/w for various distortion coefficients a. 


equal to the first. This would mean that the beam deterioration be- 
comes just noticeable, but not yet dominating, at the end. 

We shall find another application for equation (22) in the course of 
calculating the scattering crosstalk. In this case we require P to be so 
small that the second term in equation (22) exceeds the first even for 
moderate distortions, and equation (22) can be approximated by 





16EnD 
Pp (23) 


2= 

Notice that the only guide dimension that enters into this formula is 
the total transmission distance 

L = nD (24) 


from one repeater to the next. Equation (22) leads to an estimate for 
the detector sizes required and with this information and the help of 
equation (23) we can evaluate the scattering crosstalk. 
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Fig. 5 — The functional relation z(a) with P as a parameter. 


1V. BEAM SPREAD AND CROSSTALK 


The interchannel crosstalk at the end of the multiple beam guide is a 
function of the detector size. In order to minimize the amount of light 
collected from other channels, the detectors should not be larger than 
is absolutely necessary for signal detection. A few percent of the signal 
power could even be sacrificed. The signal fraction to be detected will 
depend on the signal levels available and on the noise sources involved, 
but it is probably safe to assume that, on the average, 75 percent of the 
total signal power will be sufficient. Thus we obtain from equations (22) 
and (24) for the detector radius 


213 
d= & In2 + 15(16 Li) | : (25) 
In a more general sense, we may interpret d as the average radius of a 
distorted beam at the end of a guide of length Z. In equation (25), w 
is the radius of the ideal gaussian beam which may vary considerably 
along the guide as, for example, in the case of the imaging method. 
For the grouping method, w is constant and given by equation (1). 

To compare both methods, let us consider a practical example of a 
waveguide with 100 m section length operating at a wavelength of 
1 wm over a distance of 50 km. If we use the grouping method, we find 
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w = 5.65 mm from equations (1) and (25) yields a beam radius of 
8.8 mm. In the case of the imaging method, w varies about 5.65 mm 
from lens to lens, being much smaller than 5.65 mm at the detectors. 
Yet at the end of a 50 km path, the average radius of the distorted 
beams will not be smaller than 7.5 mm because of the second part of 
equation (25). This is only slightly less than the radius of the grouped 
beams. It is obvious from Fig. 2 that under these circumstances the 
imaging method loses its advantage. Actually, in this case, the imaging 
method can only accommodate the beams contained in one group of 
the grouping method. Therefore, in the following we consider only the 
grouping method. 

For the calculation of the scattering crosstalk we restrict ourselves 
to paraxial beams. Any two beams of this kind are equivalent in the 
sense that the amount of light scattered from a beam 1 into another 
beam 2 is equal to the amount scattered from 2 into 1. In the same 
way, the scattering from one beam into all others is what the beam re- 
ceives from all others. This is exactly the crosstalk we want to caleu- 
late. Thus, in order to consider the worst situation, let us select the 
center beam and calculate what it scatters into all extraneous receivers. 
To do this we have to integrate the scattered power falling into the 
detector plane, excepting the center detector and the blind area between 
all detectors. We remember that the detectors have a radius d and are 
spaced by a distance s center to center. We obtain a reasonable and 
conservative approximation if we collect all the scattering outside a 
circle with radius s/2, which is P(s/2) from equation (21), and multiply 
this by a density factor 1d’/s”. Consequently, the crosstalk which the 
center beam inflicts upon, and receives from, all other beams is 


2 
C= ™- P(s/2). (26) 
For all practical cases the tolerable C is so small that we may use the 


approximation (23) for P. Inserting equations (1), (24), and (25) into 
equation (26) we obtain 





3 
c= 32 net 30m (15) . (27) 
Ss AS 

Figure 6 shows the signal to crosstalk ratio 1/C plotted in decibel 
versus the spacing s for the previous example, that is, D = 100 m and 
\ = 1 um. Also shown is the guide capacity N to be achieved by the 
grouping method in a guide of 10 cm radius. The transmission distance 
L between repeaters is the parameter. For reasons explained previously, 
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Fig. 6— Crosstalk versus the beam density and the capacity for various re- 
peater spacings (A = 1 um, D = 100 m, and R = 100 mm). 


the capacity achieved by the imaging method is less, for practical sys- 
tems only about N?. Figure 7 shows the useful guide radius required 
for a certain capacity at various repeater spacings if a crosstalk level 
of 23 dB is tolerable. Both Figs. 6 and 7 exhibit a system with NV = 100, 
L = 50 km, R = 10cm, D = 100m, A = 1 wm, and C = 23 dBas 
feasible but also (more or less) as a practical limit. 

The grouping method uses collector lenses in front of the detectors. 
Diffraction at these lenses must not cause excessive crosstalk even if 
the beams are badly distorted. For this reason the apertures have to be 
fairly large. If the available space is fully used, the lenses touch one 
another and are arranged as in a fly’s eye lens. The lenses should be 
so large that the power at the lens edges is mainly incoherent and not 
part of the coherent, though distorted, beam. In this case, diffraction 
does not substantially increase the total power outside the signal 
beams. This requirement sets a lower limit to the beam spacing s 
which is simultaneously the diameter of the collector lenses. How far 
this limit is approached by the system depicted in Figs. 6 and 7 is a 
difficult question to answer. 

A qualitative approach is tried in Fig. 8 where the previous results 
are also summarized. Figure 8 shows the power expected outside a 
given aperture after a beam has passed a length LZ of periscopic wave- 
guide. The beam is supposed to start with a fundamental mode radius 
w = 5.65 mm in a guide with 100-m section length. Also shown is the 
power P(s/2) falling outside a collector lens of radius s/2 where s is 
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chosen with the help of equation (27) to guarantee a crosstalk level 
of C dB. Thus, once we have decided on the crosstalk level and the 
transmission distance, we find the radius of the collector lens and the 
power falling outside this lens from Fig. 8. For Z = 50 km and C' = 23 
dB, this power seems to be composed mainly of scattered light so that 
diffraction should contribute little to the overall crosstalk. 

Several discrepancies become apparent when the results of this 
paper are compared to previous publications. The power P falling out- 
side a circle with a radius z after only one reflection is obtained if we 
replace equation (12) by equation (11) and set n = 1 in the derivation 
of equation (23). In the case of a linear structure function, equations 
(11) and (12) differ by a factor 7/4, and therefore 


P= 4r aa (28) 

The same physical problem was approached on a different course in 

Ref. 5 and is expressed in equation (16) there. That result differs from 

our equation (28) by a factor of four. The reason is a factor of four 
erroneously introduced in equation (13) of that publication. 

In Ref. 3 the crosstalk of one beam into one other beam was meas- 

ured. This quantity can be calculated on the basis of this paper. The 
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power density at a distance z from a scattering beam is 


(oo (29) 
with P(z) from equation (23). A small aperture of radius d displaced by 
a distance s from the beam center collects approximately 

c = md’q(s) (30) 


with g from equation (29). This is the crosstalk in the second beam. 
With equations (23), (29), and (30) we obtain 
_ PEL, 
s 





(31) 
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The specific dimensions in the experimental arrangement of Ref. 3 
were s = 5 mm, d = 2.5 mm, and d} = 0.63 um. The transmission 
distance was equivalent to 8.5 km of a periscopic guide. By inserting 
these data into equation (31) we obtain a crosstalk level of 19 dB. 
The measured level was 30 dB. Moreover, equation (81) suggests that 
the crosstalk decreases with the third power of the beam spacing. In 
Ref. 3, on the other hand, a decrease with the fourth power of the 
beam spacing was measured. The reason for the discrepancy is still 
under investigation. The comparison seems to indicate that the data 
used here are on the conservative side. 

Finally let us compare our results to the diffractional crosstalk which 
ideal gaussian beams experience along a wave guide. Reference 4 
considers this situation giving the following results. A guide of 10,000 
focusers, 100 m apart and 5 cm in diameter, could accommodate 16 
beams with only 60 dB crosstalk. According to our findings, scattering 
in this arrangement causes 50 dB crosstalk in one 100 m section. This 
underlines the severe limit which scattering and distortion set to multi- 
beam transmission. Improving the optical surfaces would be a worth- 
while undertaking. 


V. CONCLUSIONS 


Scattering and distortion of the beams in a beam waveguide are 
caused by surface irregularities of the focusers. There is experimental 
evidence that these irregularities can be described, as a first approxi- 
mation, by a linear structure function. Based on these findings we 
predict a beam distortion and an incoherent background radiation 
which both increase with the transmission distance L. The distortion 
makes it impossible to use a simple transmission method which images 
an array of transmitters into the detector plane. The method which 
seems more practical arranges the beams in groups and transmits them 
with unvariable width. 

The incoherent background of scattered light around every beam 
fades off with the third power of the distance from the beam. This 
causes a crosstalk inversely proportional to the third power of the 
beam spacing. The beam density is limited by the crosstalk tolerable 
after 50 km. If we set this level at 23 dB, allow a beam bundle 20 em in 
diameter, and operate at a wavelength of 1 um, we could accommodate 
about 100 beams. This is based on the assumption that periscopic 
focusers are used which are made of high quality front surface mirrors. 
A critical comparison with previous publications suggests that our 
results are conservative, if not pessimistic. 
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APPENDIX 


Heterodyne Detection 


One might consider heterodyne detection as a way to reduce the 
scattering received in every channel. Local oscillator beams could be 
brought in line with the signal beams, utilizing beam splitters. The local 
oscillator beams would discriminate to a certain extent against the 
incoherent background of scattered light from other channels. To 
compare this with the quantum counters discussed in the text, let us 
assume that the scattered background light is uniform in the vicinity 
of the local oscillator beam. For this case Siegman has calculated the 
IF-photocurrent noise of heterodyne reception.” He found that the 
“integrated effective detector area’ of the heterodyne detector is 
equal to )’. 

We now calculate the “effective detector area” for our quantum 
detectors. Every detector has an area 





he 2 
A, = me (32) 
and is preceded by an aperture with the area 
A, = 1s’. (33) 


The distance between aperture and detector is f. From Ref. 10 we find 
the “effective detector area’’ for this arrangement to be 


A,Ae rs’ : 











fe > pF (34) 
Inserting equation (1) we obtain 
2°72 
aa =~ ae (35) 


If we had sd = w’, the quantum counter would discriminate as well 
against scattering as the heterodyne detectors. 

In the case of distorted signal beams, however, both schemes cannot 
recover the full signal power. What is important in this case is the 
ratio of signal to background light collected in the respective cases. 
For this reason, the quantum counter can be equivalent to the hetero- 
dyne detector even if s and d are slightly larger than w. However, for 
reasons explained in Section IV, s will be considerably larger than w to 
avoid diffractional crosstalk. The heterodyne receiver therefore surpasses 
the quantum counter. On the other hand, the complexity of the former 
probably makes up for this advantage. 
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Asymptotic Analysis of a Nonlinear 


Autonomous Vibratory System 


By J. A. MORRISON 


(Manuscript received June 138, 1969) 


A system consisting of a spring, dashpot, and mass upon which is 
mounted an eccentric driven by a motor with a linear torque-speed char- 
acteristic, is analyzed by perturbation procedures based on small reciprocal 
of rotational inertia. Periodic solutions of the third order system, which 
arises when the angular position of eccentric mass ts taken as the new 
independent variable, are constructed, and their stability is analyzed. An 
asymptotic solution is also obtained which 1s more general than a periodic 
solution, in that the averaged rotational speed is a slowly varying function, 
rather than a constant. The results are applicable to the determination of 
the interaction between the rotational motion of a flexibly mounted motor 
and the translational vibratory motion of tts frame. 


I. INTRODUCTION 


In a recent paper Senator analyzed a system consisting of a spring, 
dashpot, and mass upon which is mounted a rotating eccentric weight 
driven by a motor with a linear torque-speed characteristic." This 
system has been analyzed by several authors, under different as- 
sumptions on the values of the parameters of the system (see Refs. 5 
through 9), and Senator discusses their results. The system is a model 
for the interaction between the rotational motion of a motor driving 
an eccentric and the translational vibratory motion of the frame, which 
is caused by this rotation. 

In Ref. 1, Senator constructed periodic (rotational) solutions by 
means of a perturbation technique, based on small reciprocal of ro- 
tational inertia. However, he did not analyze the stability of the periodic 
solutions directly, but proceeded in a somewhat different manner. Thus, 
he introduced a van der Pol type transformation, but imposed a sub- 
sidiary condition on the slowly varying functions of time which differs 
from the one usually imposed in the method of averaging. He then 
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made assumptions regarding the order of smallness of various deriv- 
atives, and dropped all second order terms from the equations for the 
slowly varying quantities, obtaining what he called averaged equations. 
The stationary solutions of these averaged equations correspond to 
periodic solutions of the original system, and he analyzed the stability 
of the stationary solutions on the basis of the corresponding linearized 
variational equations. 

It is the purpose of this paper to show how the stability condition 
obtained by Senator may be derived rigorously for sufficiently large 
values of rotational inertia. This is done by taking the angular position 
of eccentric mass as the new independent variable, constructing periodic 
solutions of the resulting third order system, and then analyzing the 
linearized variational equations corresponding to them. Perturbation 
procedures, based on the small reciprocal of rotational inertia, are used. 

An asymptotic solution is also obtained which is more general than 
a periodic solution, in that the averaged rotational speed is a slowly 
varying function, rather than constant. However, this asymptotic 
solution is not completely general, in that the transients in the trans- 
lational motion, which decay on a much faster scale, are not included. 
The asymptotic solution nevertheless provides insight into the manner 
in which a stable periodic solution is approached, and the analytical 
results are borne out by some numerical calculations. 


II. PERIODIC SOLUTIONS 


The equations of motion, in dimensionless form, for the system under 
consideration are (from Ref. 1), 


2 6\? . *6 
29 dé d* 


Here 7 is dimensionless time, a, b, p and £ > O are constants, and 
e > 0, the reciprocal of dimensionless inertia, is a small parameter. 
Also, u is the dimensionless translational displacement, and 6 is the 
angular position of eccentric mass. Instead of dealing with the fourth 
order system (1) and (2), as did Senator, it turns out to be more con- 
venient to take @ as the new independent variable. Accordingly, defining 


_ dé 


ae (3) 
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the third order system 


Bevis ern naar 
a “. + ea cos gon au) + «aX cos pe = e(p — BQ) (5) 


is obtained. 
A periodic solution to (4) and (5) is sought in the form 


U = UO, ©) = U(A) + eu,(0) + €U2(8) af ee (6) 
Q = Q(8, €-) = wo + €% (6) + €'02(6) + +++ (7) 


where u;(6@) and Q;(@) are periodic in 6, with period 27, and wo is a 
constant. Substitution of (6) and (7) into (4) and (5), and comparison 
of the lowest powers of e, leads to 

















wd i + 2a, HH + nie = awh sin 4 (8) 
dQ dvu 

Wo “7h. 7) 76° = (p 72 bw). (9) 

The periodic solution of (8) is 
Uy = awgAgl(1 — ws) sin 6 — 2fw cos 6] (10) 

where 
= [(1 — 9)” + 40°a9]. (11) 
In order that 0,(6) should be periodic, it is necessary from (9) that 

du 

Dp = bw + at rT = bw + a” fw) Ap = p* (wo) (12) 


using (10). ( )ay denotes average over a period 27 of 6. Equation (12) 
gives a relationship between w, and p, the dimensionless stall torque, 
and this relationship is depicted graphically in the figure for a = 0.707, 
¢ = 0.2, b = 0. It is noted that w, is a triple valued function of p in 
part of the range. Senator concluded from his analysis that the middle 
branch corresponds to unstable periodic solutions, while the outer 
branches correspond to stable ones, a result verified in this paper. 
Now, from (9), (10), and (12) it follows that 


Q, = {w, — fa *weAg[(1 — w?) cos 26 + 2two sin 26} (18) 


where w, is a constant, which is to be determined from the condition 
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that 0,(6) should be periodic. It is clear as to how the higher order 
terms in the expansions in (6) and (7) may be obtained, but they will 
not be needed in the subsequent analysis. The periodic solutions in (6) 
and (7) are equivalent to those derived by Senator as periodic solutions 
of (1) and (2). It is necessary, of course, to perform a quadrature of 
equation (3) in order to obtain a relationship between 6 and r. 


TI. STABILITY ANALYSIS 


The variational equations corresponding to the periodic solution 
a, 2, given by (6) and (7), are formed by substituting 


=(@+£), G=(@+7) (14) 
in (4) and (5), and linearizing in — and y. Thus, 
o@h yon tt, 4 oad e+ te 


~ dQ dé (3 (3 dy dQ "| = ees 
+ eT, oy el + a cos 6 ant a9 7 = 2aQsin 6n, (15) 


(1 + ca cos 9 M#)(q 40 a, a 1) + cad eos 9 


2~ 


+ 2eaQ cos 0 n + ea cos 0 as + ebn = 0. (16) 
Equations (15) and (16) are linear equations with periodic coefficients, 
and the form of solution is known from Floquet theory.” Moreover, 
if all the characteristic exponents of the variational equations have 
negative real parts, then the periodic solution @, 2 is asymptotically 
stable. The behavior of the characteristic exponents will be analyzed 
ford <e <1. 

The limiting case e — 0+ will first be considered. In this case, from 
(6) and (7), = w, and @ = u)(6) so that, from (15) and (16), dn/d@ = 0 
and 

O 

Wo a + 26 “o 76 

Hence one of the characteristic exponents is \) = 0, and the remaining 
two characteristic exponents satisfy 


(wodo)” + 2E(worr) + 1 = 0 (18) 


and hence have negative real parts, since £ > 0 and w, > 0. For suffi- 


dé dU die) 


E 4 £ = 2(awysin 9 — wp 16° —¢ 6 (17) 
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ciently small ¢, these real parts will remain negative, so that it suffices 
to investigate the characteristic exponent which vanishes as e > 0. 

In the light of Floquet theory, a solution of (15) and (16) is sought 
in the form 


E= e"P(6), 9 = eQ(6) (19) 
where P and Q are periodic in 6, with period 27, and 
A=ea,terwt--: (20) 
P(8) = Po(6) + ePi(0) + € P28) + ++: (21) 
Q(9) = Qo(8) + «Q:(8) + €Q(8) + ---. (22) 


It is a straightforward matter to substitute from (6), (7), and (19)-(22) 
into (15) and (16), and to compare like powers of ¢. In particular, it 
is found from (16) that dQ,/d@ = 0. Omitting a multiplicative constant 
and taking Q) = 1, it is then found that 


Wo ae + 2a oy ae 2600 Fy oe = 209 on + P, = 2aw, sin 0 (23) 
and 
a1 4.) 4 20 


+ 2aw, Cos ge 3° + aw, cos 6 ae +b=0. (24) 


Now, in order for Q,(@) to be periodic, it is necessary from (24) that 


2 
where 
0 = (w Po + 2uUo) (26) 
is are in 6, with period 27. But, from (23), 
we ine + Sw Tp o to Ag 2( oat sin 6 + fa, oe -+- us) (27) 


and uy is given by (10) and (11). Straightforward calculations lead to 
Ro = 2a Aa{[(L — w0)*(2 — wo) + 4¢%wo(1 — 2u9)] sin 6 
— fw [(1 — w)(5 — wo) + 12¢°%w5] cos 6}. (28) 
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Thus, from (25), 
wd, + b+ a’ fwo AGl(L — w5)(5 — wo) + 12f%5] = 0. (29) 


However, as may be verified from (12), using (11), equation (29) 
may be written in the form 


d * 
Wor + rie = 0. (30) 
0 


Since the sign of \ in (20) is determined by the sign of 2,, for suffi- 
ciently small « > 0, it follows that the periodic solution @, @ is asymp- 
totically stable if dp*/dw,) > 0, and is unstable if dp*/dw,) < 0. That is, 
the middle branch of the figure corresponds to unstable periodic solu- 
tions, while the outer branches correspond to asymptotically stable 
ones, provided that « > 0 is sufficiently small. 


IV. MORE GENERAL SOLUTIONS 


In this section a more general solution of (4) and (5) is constructed, 
for 0 < « <1. Thus an asymptotic solution is sought in the form 


Vow, 0) + ,(w, 6) + Ev, ie ee (31) 
w + ew,(w, 6) + ww, 6) + ++: (32) 


where v;(w, 6) and w;(w, 6) are periodic in 0, with period 27, and 


U 


Q 


ll 


ms = €9,(w) + €'go(w) +++: (83) 


This procedure may be regarded as a variant of the method of aver- 
aging.” The above solution is more general than the periodic solutions 
constructed previously, since the latter correspond to the case in which 
w is constant, rather than a slowly varying function of 6. However, 
this solution is not completely general, since the initial transients in 
(4) are not taken into account. 

Substituting (81) and (82) into (4) and (5), using (83), and comparing 
the lowest powers of «, it follows that 


oO 2p MO 9 = au? sin 6 (34) 


and 
(2s + w) + aw” cos 0s = (p — ba). (85) 


The periodic solution of (34) is 
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Vp = aw A(w)[(1 — w’) sin 6 — 2%w cos 6] (36) 
where 
A(w) = [(. — w*)* + 4¢°%o7}*. (37) 


In order that w, should be periodic, it is necessary from (85), using 
(36), that 


wgi(w) = [p — bw — a°fw°A(w)]. (38) 


Then w,(w, 0) may be found from (85), to within an arbitrary function 
of w. This arbitrariness is usual in averaging procedures, and may be 
removed by requiring (w;(w, @)).» = 0, so that, from (32), w is the aver- 
aged value of Q. Higher order terms in the asymptotic expansion (31) 
and (82) may be obtained in a systematic manner. 
Now, from (33) and (88), 

dw _«€ _ m* 2 

16 = PT P*@)) + Of) (39) 
where 

p*(w) = bw + a’tw°A(w). (40) 


As previously remarked, the case in which w is constant corresponds 
to the periodic solutions constructed earlier. From (11), (12), (87), 
and (40), it follows that w» is the lowest order approximation to a 
stationary solution of (89). If w ¥ w., equation (89) determines, to 
lowest order, the slow variation of w with 6. The direction in which w 
changes is determined, to lowest order in ¢, by the sign of [p — p*(w)], 
and is illustrated in Fig. 1 for p = 0.45, to which there correspond 
three values of w, , denoted by wo; , Wom, aNd wo, . 

Under more general initial conditions similar results should hold, 
for sufficiently small «, provided that the initial value of Q is not too 
close to Wom. This is because 2 does not change significantly, for suffi- 
ciently small ¢«, during the time in which the initial transients in the 
translational motion die out. 

A partial check of these analytical results was made by Senator,* who 
carried out some numerical solutions of (1) and (2). With a = 0.707, 
¢ = 0.2, b = 0 and e = 0.1, he chose initial conditions consistent with 
the unstable periodic solution corresponding to p = 0.425, that is, the 
periodic solution corresponding to p*(wom) = 0.425. He then carried 
out numerical solutions of (1) and (2) for p = 0.45 and p = 0.4. He 
found that for p = 0.45 the solution approaches the periodic solution 
corresponding to wo, in the figure, that is, to p*(wo,) = 0.45, while for 
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Fig. 1 — Stall torque vs. averaged rotational speed. 


p = 0.4 the solution approaches the periodic solution corresponding 
to p*(wo,) = 0.4. These results are consistent with our analytical re- 
sults. Moreover, the number of cycles required before the solution 
settles down to the appropriate periodic solution was somewhat larger 
in the case p = 0.45 than in the case p = 0.4. This is consistent with 
(39), and the presence of the factor (1/w) multiplying [p — p*(w)] 
therein. 
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The Capacity of Linear Channels with 


Additive Gaussian Noise 


By R. K. MUELLER and G. J. FOSCHINI 


(Manuscript received August 8, 1969) 


The standard method of computing the mutual information between 
two stochastic processes with finite energy replaces the processes with their 
Fourier coefficients. This procedure is mathematically justified here 
for random signals w,(w) square-integrable in the product space t X w 
where t « [0, T] and w ts an element of a probability space. A natural 
notion of the signa field generated by w,(w) ts presented and it is shown to 
coincide with the sigma field generated by the random Fourier coefficients 
of ww) in any complete orthonormal system in L.[0, T]. This justifies 
the use of Fourier coefficients in mutual information computations. 

Capacity ts calculated for finite and infinite-dimensional channels, where 
the output signal consists of a filter (general Hilbert-Schmidt operator) oper- 
ating on the input signal with additive Gaussian noise. The finite-dimen- 
sional optimal signal is obtained. In the infinite-dimensional case capacity 
can be approached arbitrarily closely with finite-dimensional inputs. The 
question of the existence of an infinite-dimensional signal which achieves 
capacity 1s considered. There are channels for which no signal achieves 
capacity. Some results are obtained when the noise coordinates are inde- 
pendent in the eigensystem of the filter. 


I. INTRODUCTION 


In this paper, we attack a general form of the classical problem of 
determining the capacity of a linear channel with additive noise. 
Structurally we have 


n= i CG Dade (1) 


where the random signals, noise [7,(w)], input [s,(w)], and output 
[r,(w)] are all defined on 0 S$ ¢ S T. All signals as well as the kernel of 
the channel operator are assumed square integrable in the appropriate 
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product spaces. The noise process, the channel operator, and an average 
power restriction on s,(w) are assumed to be given. In Section III we 
begin by defining the capacity of a channel. Our definition is motivated 
by, but is not a special case of, the generalization of Shannon’s notion 
of capacity that has been indicated by Kolmogorov. The argument for 
the naturalness of our definition is that any of the above processes can 
be replaced by their random Fourier coefficients from any expansion 
using complete orthonormal functions in Z,[0, T]. We solve the above 
problem when n,(w) is Gaussian and independent of s,(w). In Section IV 
we show that for finite-dimensional inputs there always exists an s,(w) 
for which capacity is achieved and we find it. The infinite-dimensional 
case is solved in Section V as a limit of finite-dimensional cases. 


II, FUNDAMENTALS 


Fundamental to the notion of capacity is the notion of mutual infor- 
mation. We begin with Kolmogorov’s definition of the mutual infor- 
mation of two event o-fields contained in a universal o-field. Let @ and @ 
denote two sub o-fields of a o-field Sp in a probability space (Q, Sq, P). 
Let a and 6 denote arbitrary partitions of 2 into a finite number of @ 
and ® measurable sets A and B. The mutual information J(@, ®) of @ 
and @ is 


- P(A CB) 
I(@, B) sup pa 2d, P(A ( B) log, P(A)P(B) (2) 
We define 0 log 0 = 0. This sum does not decrease as a and £ are re- 
fined. It can be shown that [(@, ®) = O with equality if and only if 
@ and @ are independent. The nonnegativity and other important 
properties of J are presented in Ref. 1. 

Let X be a measurable space with o-field denoted by D. A function 
¢(@) from (Q, Sq, P) to X for which each D « D has a preimage in Sy 
is called a measurable function. 

Let T be an arbitrary index set and let E* denote the real line. Endow 

Il,.7 E’ with the product topology and consider its measurable sets 
to be the smallest o-field containing the topology. We are interested 
in measurable functions from 2 to II,,7 EZ’. For our purposes T is either 
countable or a real compact interval. 

Suppose £ and 7 are measurable functions from 2 to I,,7 HE’. Then 
by the mutual information of ~ and 7, I(é, 7), we mean the mutual 
information between the smallest o-fields with respect to which & and y 
‘are measurable. We denote these respective o-fields by @; and @, : 
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Let ¢(w) denote any measurable function from Q to I],,7 E’. We 
define the probability distribution P, of {(w). The domain of P; is the 
measurable sets in II,,7 EH’. Let Q be such a measurable set. Then 


PQ) = Plow :f(@) e Q}. (3) 


If and 7 are each measurable functions from 2 to II,,7, Hand ,.7, E” 
respectively then (£, 7) is a measurable function from Q to Il,,7, EZ’ X 

II,.7, EZ’ and its distribution function is denoted P;,, . It is called the 
joint distribution of £ and 7. We can now give an alternate definition 
of mutual information between & and 7. Let y(6) denote arbitrary 
partitions of I,.7,£’(U,.7,H") into a finite number of measurable 
sets C(D). The mutual information I(é, 7) is 


I, 9) = sup DPC X D) oe BI. 


Recall that the inverse image under a measurable function of a o-field 
is a o-field. So it becomes apparent that the two definitions for I(é, 7) 
are equivalent. 

We review without proof some fundamental propositions that will be 
of use to us later. The following is a result of work by I. M. Gelfand, 
A. M. Yaglom, and A. Perez. 


Theorem: If P;, ts not absolutely continuous with respect to the product 
measure P; X P, then I(&, n) = ©. If P;, ts absolutely continuous with 
respect to P; X P,, then letting dP;,/d(P: X P,) denote the Radon- 
Nikodym derivative of P;, with respect to P; X P,, we have 


re.9) = [tos ees | oP. 6) 


Proof: See Ref. 2. 
Theorem: Let A be a linear transformation in a k-dimensional vector 
space and let & be a k-dimensional random vector. Then 


I(é, 1) 2 I(Aé, 2) (6) 


holds for any random vector n, with equality tf the transformation A is 
nonsingular. 





Proof: See Ref. 3. 
Theorem: If I(&, &) < ©, then P, ts purely atomic. 
Proof: See Ref. 4. 


84 THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1970 


Theorem: If & = (&,, &,---), then 
TI, n) = lim Jig , ee En) nl: (7) 
Proof: See Ref. 4. 


III. MUTUAL INFORMATION BETWEEN TWO PROCESSES IN L2{(Q, So, P) X 
((0, 7], L, m)} 


Let &,(w) be square integrable on ¢ X w. We term £,(w) a stochastic 
process. Notice that it differs from the standard definition of a stochastic 
process in two ways. First, it is an equivalence class of equal almost 
everywhere functions in (¢ X w). Second, not all functions in the equiva- 
lence class are stochastic processes in the sense of Ref. 5.; that is, for 
each ¢ we do not have a random variable but only for almost all ¢. 
We assume /{ £,(w)} = 0. By Schwarz’s inequality and Fubini’s theorem 
it follows that E’{ &,, (w)&,,(w)} « L(t X t). If n,(w) and ¢,(w) are processes 
of the same type as &,(w), I[7,(w), £,(w)] is not well defined since @, 
and @; are not well defined. Because of the central role of these processes 
in modeling random signals with a finite average power we make @, 
and @; and hence J[n,(w), ¢,(w)] meaningful here. We need to appeal to 
the following: 


Theorem (I. Riesz): Let f, converge in measure to f. Then there exists a 
subsequence f,, converging to f almost everywhere. 


Proof: See Ref. 6. 


Suppose f, converges in mean square to f. Since convergence in mean 
square implies convergence in measure, the limit of the subsequence 
guaranteed by Riesz’s theorem is f in the sense that the limit and f agree 
almost everywhere. This last comment is important since Kolmogorov 
has given examples of functions g which possess an orthogonal expansion 
- J, Converging in mean square to g, yet pointwise almost everywhere 
convergence does not occur. 

Unless stated otherwise all o-fields mentioned in the remainder of 
this section are assumed to be completed. The following new definition 
is the key to making £,(w) meaningful in the information theory sense. 


Definition: By the o-field @ generated by £,(w) we mean the smallest 
o-field @ satisfying &,(w) is (Q, @) X ([0, T], Z) measurable, where L 
is the sigma field of Lebesgue measurable subsets of [0, 7]. (This state- 
ment is definitive since &,(w) is (Q, Sc) X ([0, 7], L) measurable and 
the intersection of o-fields is a o-field.) 
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Proposition I: Suppose 
DX a:w)g.() = &@) (8) 


721 
in the mean square sense in the product space (where a;(w)=J £,(w),(t) dt 
and ¢,;(t) are orthonormal on [0, T]). If a@) = [ai(w), a2(w), +++] then 
Q: ca! Qa . 


Proof: Since the expansion converges to £,(w) in mean square in the 
product space, it converges in measure. By I. Riesz’s theorem we can 
findn, < no < --- so that 


lim [a@)di() +--+ + dudu()] = &@). (9) 


The sum and product of measurable functions is measurable so that 
each partial sum is (Q, @,) X ({0, T], LZ) measurable. The limit of 
measurable functions is measurable so &,(w) is (Q, @,) X ([0, T],Z) 
measurable. Thus @; C @, . 

Next we project &,(w) on ¢;(¢) to get 


ails) = f &(o)6.(0 a. (10) 


By Fubini’s theorem a;(w) is measurable with respect to every o-field 
® for which £,(w) is (Q, 8) X ([0, T], Z) measurable. But this is true for 
each 2, So @, C Q: . 

Proposition I is of paramount importance. In the sequel it enables 
us to replace £,(w) by a(w) when computing mutual information. 

It would seem appropriate to express @; without reference to an 
expansion. The following proposition accomplishes this. However, our 
proof does resort to an expansion of £,(w). Because the proof is similar 
to the proof of proposition I, we omit it. 


Proposition IT: Let {&(w)} denote the class of functions in &,(w). Then 
Q; 1s the smallest o-field containing (\, Q,« [Here we have the only ap- 
pearance of possibly noncomplete c- fields (the @;.)]. 

We can now define capacity of our noisy linear channel. Let S denote 
a finite average power restriction on s,(w). Then the capacity of the 
channel is defined as the supremum of I[s, (w), 7,(w)] where the supremum 
is over all s,(w) satisfying 


al 5 is si(w) a| <8. (11) 


We say &,(w) is Gaussian if the linear functionals 


86 LINEAR CHANNEL CAPACITY 


T 
[ seo at {$(t) e Ls[0, T)} 
are all Gaussian random variables. 


IV. THE FINITE-DIMENSIONAL CASE 


For a random variable 7 possessing density p, the quantity 


n(n) = — | py los p, (12) 


arises often in mutual information studies. It is called the differential 
entropy of 7. 
The following theorem is proved in (Ref. 7). 


Theorem: Let p. be the density of a k-dimensional random variable u. 
To maximize 


nu) = —| p. logp. 


subject to the conditions that the mean and dispersion matrix have given 
values u and I, choose the normal density 
Qu) = 2? | TF exp [E(u — wT" — w)], 


which satisfies the conditions. . 
We prove a corollary necessary for the sequel. 


Corollary: Let p, be the density of a k-dimensional variable u. We want 
to choose p, to maximize 


h(u) = |. log Du 


subject to the conditions that the mean is uw and the dispersion matrix 
satisfies the constraint that tts trace ts less than or equal to ST. The solution 
is to choose p, to be Gaussian with mean u and covariance ST/kI, where 
I is the identity matrix. 


Proof: From the preceding theorem we only need to consider Gaussian 
densities. For a Gaussian density we can write the formula 


hu) = 2 log 27e + 3 log | T |. (13) 


Maximizing h(u) is equivalent to maximizing | I |. Now by the geometric 
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mean—arithmetic mean inequality 


prj s | Seeee] (14) 


with equality if and only if y1. = Yoo = °° * Yee - 

Now we are ready to consider the finite-dimensional version of the 
problem of finding the optimal power-restricted signal s,(w) which 
maximizes the mutual information between it and the output 7,(w). 
By what we have shown in the previous section we can replace these 
processes by their Fourier coefficients when computing mutual infor- 
mation. More specifically let n,(w) be a finite-dimensional Gaussian 
process of dimension k. Let G denote a nonsingular operator on EZ” and 
let s,(w) be a k-dimensional process that is independent of n,(w). Suppose 
that the distribution of n,(w) is absolutely continuous with respect to 
Lebesgue measure in Z*. We want to find s,(w) such that its distribution 
is absolutely continuous with respect to Lebesgue measure in E* and 
I[s,(w), Gs,(w) + n,(w)] is maximized subject to 


a [ " s2(e) ut| < Sr. 


Now by the theorem concerning linear transformations of random 
vectors stated earlier, J[s,(w), Gs,@) + ,@)] = I[s,(w), s,(w) + 
G-*n,(w)]. Define 7,(w) as n,(w) = G'n,(w) and let 


= | and sf = fl 
al li 


be coordinates of 7,(w) and s,(w). 
Then 


T[s.(w), 8) + a(w)] = | pew. Fat (9) Neeoreercraeas Bateah ot 
Dsk+nkPsk 


= | essa. log cers + A(s* + 7"). 
Introducing the transformation 
k k k 
s" s° 


into the above integral and using the fact that s" and y* are independent 
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we have 
I(s", s* ai 1’) 
= | pay log 2 + At + at) = —Wal) + AGE + 1. (15) 


Since h[n,()] is not a function of s,(w), we have reduced the problem 
to that of maximizing h[s,(w) + G~'n,(w)] subject to 


T 
Bf sj) < 87. 
0 


Now 7, (w) is Gaussian and we know from the corollary stated earlier 
that h[s,(w) + G~*n,(w)] subject to the above constraint is maximized 
by a Gaussian process. Thus, without loss of generality, s,(w) can be 
assumed to be Gaussian. We seek I’, the covariance of s,(w) so that 
| rT, + G°T,(G")' | is maximized, since this maximizes h[s, (w) -+ 
G-'n,(w)]. Let us assume, without loss of generality, that G"T,G"" =T, 
is diagonal. Thus the problem is to maximize 


r mi i ‘| 
“lead 


subject to I, a covariance matrix with trace (T,) S ST. Since we are 
maximizing a continuous function over a compact set, we know that 
the maximum exists. 

We use induction to show that the optimal I, is diagonal. For m = 1 
the statement is a trivial one. For m > 1 it shall be convenient to 


partition T, so that 
T..= " _ 
eS; 


ro 
r=f,+ oe ; 
le J 


Now using some standard results on determinants (see Ref. 8, p. 46), 
it follows that 


Let 
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intr,ja= [mtn 7’ 


= (m1 + ) det T — et T)y’T'y. 
| oy pr 





(16) 


Note that both I and I~’ are positive definite. It is optimal to choose 
+ = 0 to maximize the second term. The first term is also optimal if T' is 
diagonal. This follows since any nondiagonal T with trace }-*_, yi: = 
ST — y,, has determinant less than or equal to some diagonal matrix 
with trace equal to ST’ — y,, by induction. We now optimally select 
the diagonal elements of T, . If an e > 0 of ST is to be put on a diagonal 
element of I, , it is optimal to add it to min {(y;; + 7;),¢7 = 1,--- ,k} 
so that it will have the largest possible multiplier in the determinant of I. 


V. CAPACITY FOR THE INFINITE-DIMENSIONAL CASE 


We turn to calculating the capacity in the situation where there can 
be an infinite number of Fourier coefficients of s,(w) and y,(w) and where 
the channel G is an infinite-dimensional Hilbert-Schmidt operator. 

Define 


a= [ 6a» 


where G(f, 7) « L,(t X 7). Let {¢;} be a complete set of orthonormal 
eigenfunctions for G * @ and let {\;} be the associated eigenvalues. 
Define Go; = y; ; then (Y; , ¥3) = (Go: , Gb;) = @: , G * Go;) = i 5:5 
and so {y;/\?} is an orthonormal set. We use r and 7 to denote the 
infinite vector of Fourier coefficients of 7,(w) and 7,(w) in the system 
{y;/d2} while § denotes the infinite vector of Fourier coefficients of 
s,(w) in the system {¢;}. Let r* denote the first k coefficients of r and 
define 8” and 7” similarly. Let D be the doubly infinite diagonal matrix 
with \? as the zth diagonal element and define D, to be the k X k sub- 
matrix of D with indices less than or equal to k. Then r = Dé + 7» and 
r= D,s* + 7’. 

We first show that if an optimal input signal exists, then there is an 
optimal Gaussian input signal. We shall need the following lemma. 


Lemma: For any signal 8, lim I(r’, 8°) = I(r, 8). 
Proof: We know that lim; lim, I(r’, 8“) = I(r, §). As stated earlier 
this is proved in Ref. 4. Now 


I(r’, 8’) = hr’) + [pes log 
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where r* = D,§* + 7‘. If j = 3, 





| ese log ae = | pve log Pi = i log Pri + 
Thus I(r‘, 8’) = I(r’, 8*) for j > ¢; then lim,..,, I(r’, §’) = I(r’, 8) = 
I(r’, §). Finally lim,;..., I(r’, 8) = lim, I(r", 8) = I(r, 8). 


Alternately this lemma can be proved by extending some results of 
Ref. 3 to the infinite-dimensional case. 

We now show that if an optimal signal exists for the infinite-dimen- 
sional case, then the optimal signal can be assumed, without loss of 
generality, to be Gaussian. 


Proposition III: If 8, is a non-Gaussian optimal signal then 8, , the 
Gaussian signal with the same covariance matrix as 8, , is optimal. 


Proof: Clearly I(r} , 8) S I(r} , 8) for all & since the Gaussian process 
is optimal for a fixed covariance matrix in the finite-dimensional case. 
Thus I(r; , §,) = lim, I(r’, 8) S I(re , 8) = lim, I(r} , $9). 


Proposition IV: The capacity of the infinite-dimenstonal channel is the 
limit of the capacities of the k dimensional truncated approximation of the 
infinite-dimensional channel. 


Proof: Let C, denote the capacity of the k dimensional channel and let 
C = lim C, . We claim the capacity of the infinite-dimensional channel 
is C. It is evidently at least C. Suppose a signal s,(w) exists satisfying 
the constraints with mutual information, I(r, §) greater than C. Then 
I(r, 8°) S C;, since &* satisfies the power constraint. Thus I(r, 8) S C, 
a contradiction. 


Corollary 1: There exist finite-dimensional signals whose resulting mutual 
information is arbitrarily close to the capacity. 


Corollary 2: If C, ts constant for all k larger than some integer 1, then 
the | + 1-dimensional optimal signal is optimal for the infinite-dimen- 
stonal case. 


5.1 Limiting Covariance Matrices and Optimal Signals When {n;:} Is 
Independent 


It is not always true that some input signal achieves capacity in the 
infinite-dimensional case. We first prove this. Then we study the special 
case when {;} is independent in the {y;/\?} system. This case may be 
of marginal interest insofar as a model of a realistic system. However 


LINEAR CHANNEL CAPACITY 91 


it is mathematically tractable and hence serves as a good testing ground 
for intuition into more general behavior. 

We now show that no optimal input signal exists for the case \; = 
1/7, En?/d; = 1. It is clear that C, = 4 log (1 + ST/k)*. Then the 
capacity is: C = 2 lim... log (1 + ST/k)* = ST'/2. If there exists an 
optimal signal s, I(r’, &°) — ST/2. But I(r’, &‘) = 4 log | Ts: + J; |, 
where I; isanz X didentity matrix. Then I(r’, 8’) S 3 >oi_, log (1+ Zs”) 
and lim I(r’, &°) S$ 4 >02., log (1 + Es?). Recall that >°%., Hs? = ST 
by assumption. We show that >>%., log (1 + Es?) < ST. Since Es? = 
log (1 + Es) with equality if and only if Zs? = 0, lim I(r’, 8’) S 
4 Doe, log (1 + Es?) < § DOR, Bs] = ST/2. 

Although an optimal signal does not always exist, we can say when 
it does exist in the special case when the {7,;} are independent in the 
system {y;/d3}. It will turn out that {Tj}, the sequence of finite- 
dimensional optimal covariance matrices for s, converges in some cases 
to an optimal solution and in other cases the limit is not optimal. The 
diagonal matrix with a; = (1/\;)E'n7 on the 7th diagonal element com- 
pletely determines whether or not an optimal limit is reached. 

We define the order of minima of a sequence {£;}?_, as follows. The 
order is 0.5 if no smallest element in {£;}$_, exists. If 17, is defined to be 
the set of smallest elements in {£;}$., and Card (M,) = + ©, the order 
of the sequence is 1. If Card (M,) < + but the set {U & — M,} 
has no least element, the order is 1.5. If the set {U & — M,} has M, 
smallest elements and Card (M.) = +, the order is 2. If Card (M.) < 
+o but the set 


A 


2 
j=1 


has no least element then the order is 2.5, and so on. If the sequence 
is not assigned a finite order of minima, the order is infinite. 

If the order of minima of {a;} is 0.5, {f';.} — [0]. To see this we need 
only consider diagonal elements of [';. . Suppose for some j and for 
some « > 0, #8} = e in an infinite number of tx . Since no smallest 
element in {a,} exists, there are an infinite number of a; , say {a,-} 
smaller than a; . Then in the optimal covariance matrices where H'8} = 
and the 2’ appear, Hs}. = ¢. But this is not possible with the constraint 
>> Es? < ST. Thus for each j and e > 0, Es? < « in all but a finite 
number of I; . 

If the order of the minima of {a;} is 1, fj. — [0]. This follows since 
it is optimal to put the power on the minima. After some k, only the 
minima will have positive Hs; . Since there are an infinite number of 
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them and the ST is optimally distributed equally on them, fj. > [0]. 

If the order of the minima of {a,} is 1.5, there are two cases. Let 
h = inf {U & — M,} andg = inf {U €,}. If (h — g) Card (M,) 2 ST, 
there is an optimal solution as a limit consisting of H§} = ST/Card (M,) 
for those 7 corresponding to a; e 1, . Otherwise the covergence is to a 
matrix where Hs? = h — g if a; « M, and zero elsewhere, which is clearly 
not optimal. The analysis for other finite order systems are analogo:s 
to the above. Hither (¢) ST is distributed over a finite number of com- 
ponents, in which case the convergence is to an optimal solution, or 
(iz) ST has to be distributed over an infinite number of components, in 
which case the convergence is not to an optimal solution. 

If the order is infinite and we run out of the quantity ST on a finite 
number of components, the resulting finite-dimensional solution is 
optimal. Suppose the order is infinite and we do not run out of ST on a 
finite number of dimensions. Let 6 be the smallest accumulation point of 
{a;}. If not all of ST is used in making 


the limiting covariance is not optimal and no optimal covariance which 
achieves capacity may be constructed. This follows since a finite amount 
of ST must be distributed equally to an infinite number of components. 
If all of the ST is exactly used to make 


2 
Ir; 


te 


the limiting covariance is optimal. Before proving this we give an 
example of such a case. 
Let 
oe OED) a one 
Me? Gael Bn: = Gap 
and assume the 7; are independent. Then a; = 1 — 1/(i + 1)°. To 
bring all components 

















Er: 
Ni 
to 1 we need 
— 1 
sf = 2. GED? 


and we are then in the case considered above. 
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We now show that the limiting covariance matrix for the case when 
there is just enough S7' to bring 


2: 
Er; 


Wika 


is optimal. Let T,, be the limiting matrix with the corresponding Gaus- 
sian process §, . Let r, = Dé, + . We show that I(r}, 8f) ~ C,k— «. 
Now suppose $" is optimal for k-dimensions. Then 


k ok 
Fo, 8) — 16 8) = ne) — ne) = Flog | e+ 2] Thr, 


k 
—Llog 6 TJ 2, . (17) 
t=1 


Here 6(k) equals that part of ST not used in the matrix I,, in the first 
k-dimensions. Clearly we are assuming that the smallest elements of a; 
appear first. Notice that 6(k) — 0 ask — o. Then 


re, #) — ret) =Eiog[1 +8] sas 


for k sufficiently large. Then 
lim I(r*, 8°) = lim I(r* , 8) = I(r, , 8) = C. (19) 


VI. SUMMARY 


Let us review what we have done. Since we chose to deal with signals 
£,(w) square-integrable on L.{(Q, So, P) X ([0, T], LZ, m)}, we define 
the mutual information between two such signals using Proposition 
II and equation (2) in such a way that it agrees with the mutual infor- 
mation of their Fourier coefficients defined in equation (10). For the 
channel defined in equation (1) with input signals constrained by 
equation (11), we calculate the capacity of the channel. First in Section 
IV the capacity problem is considered when only a finite number of 
Fourier coefficients are nonzero. We use the corollary to the theorem in 
Section IV and equation (15) to show that only Gaussian signals have 
to be considered. Then equation (16) is used to calculate the finite- 
dimensional optimal signal by ‘‘filling the well.’’ In Section V the case 
of an infinite number of nonzero Fourier coefficients is considered. 
We show in Proposition III that optimal signals, if they exist, can be 
chosen Gaussian. In Proposition IV the capacity of the infinite-dimen- 
sional channel is calculated as the limit of finite-dimensional capacities. 
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Finally in Section 5.1 we deal with the existence of an optimal signal. 
In general no optimal signal exists. A special case is examined when the 
noise components are independent in a fixed coordinate system. 


APPENDIX 


Symbols Used 


The following is a list of symbols used throughout the text. 


L,.—the set of square-integrable functions 


n,(w)—a noise process 
n,(w)—a noise process 
s,(w)—the input signal process 
r,(w)—the output process 


G—the linear channel operator 

G*—the adjoint of G 

P,—a probability measure generated by é 

p,—the probability density of the random vector 7 
@—a sigma field 


I(é, 7) —the mutual information between ~ and 7 


h(n)—the differential entropy of 7 
I',—the covariance of s 
E*—uclidean k-space 
|T'|—the determinant of T 

I—the Lebesgue measurable sets 
m—Lebesgue measure 


Card—Cardinality 


E—expected value 
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Theorems on the Analysis of 
Nonlinear Transistor Networks* 


By I. W. SANDBERG 
(Manuscript received August 19, 1969) 


This paper reporis on further results concerning nonlinear equations 
of the form F(x) + Aw = B, in which F(-) ts a “diagonal nonlinear 
mapping” of real Euclidean n-space EK” into itself, A is a realn X n 
matrix, and B is an element of E”. Such equations play a central role in 
the dc analysts of transistor networks, the computation of the transient 
response of transistor networks, and the numerical solution of certain 
nonlinear partial-differential equations. 

Here a nonuniqueness result, which focuses attention on a simple special 
property of transistor-type nonlinearities, is proved; this result shows that 
under certain conditions the equation F(x) + Ax = B has at least two 
solutions for some B € E”. The result proves that some earlier conditions 
for the existence of a unique solution cannot be improved by taking into 
account more information concerning the nonlinearities, and therefore 
makes more clear that the set of matrices denoted in earlier work by Po 
plays a very basic role in the theory of nonlinear transistor networks. In 
addition, some material concerned with the convergence of algorithms for 
computing the solution of the equation F(x) + Aw = B is presented, and 
some theorems are proved which provide more of a theoretical basis for the 
efficient computation of the transient response of transistor networks. In 
particular, the following proposition ts proved. 

If the de equations of a certain general type of transistor network possess 
at most one solution for all B € E” for “‘the original set of a’s as well as 
for an arbitrary set of not-larger a’s’’, then the nonlinear equations en- 
countered at each time step in the use of certain implicit numerical inte- 
gration algorithms possess a unique solution for all values of the step size, 
and hence then for all step-size values it is possible to carry out the algo- 
rithms. 

* The material of this paper was presented at the Advanced Study Institute on 


Toeen Theory (sponsored by the N.A.T.O.; Knokke, Belgium; September 1-12, 
1969). 
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I. INTRODUCTION AND DISCUSSION OF RESULTS 
References 1 and 2 present some results concerning the equation 
F(z) + Ax = B, (1) 


in which, with n an arbitrary positive integer, A is areal n X n matrix, 
B is an element of real Euclidean n-space /”, and F(-) is a mapping of 
E" into L” defined by the condition that* for all a= (a, ,%2,°'',2n) E 


F(a) = [fi(r1), fo(ae), «++ 5 fa(atn)]"* (2) 


with each {;(-) a strictly monotone increasing mapping of E’ into itself. 
Equation (1) plays a central role in the de analysis of transistor net- 
works,** the transient analysis of transistor networks (see Section 1.4), 
and the numerical solution of certain nonlinear partial differential 
equations. 

In Ref. 1 it is proved that there exists a unique solution x of equation 
(1) for each strictly monotone increasing mapping F(-) of £” onto E” 
(that is, for each set of strictly monotone increasing mappings f;(-) of 
E" onto itself) and each B € EZ” if and only if A is a member of the set 
P, of realn X n matrices with all principal minors nonnegative. It is also 
proved in Ref. 1 that equation (1) possesses a unique solution x for 
each continuous monotone nondecreasing mapping F'(-) of Z” into Z” 
(that is, for each set of continuous monotone nondecreasing mappings 
of EZ’ into EZ") and each B € EK” if A belongs to the set P of all real 
n X n matrices with all principal minors positivet. A direct modification 
of the existence proof given in Ref. 1, as indicated in Ref. 2, shows that 
equation (1) possesses a unique solution for each strictly monotone 
increasing mapping [*(-) of E” onto (a1 , 8:1) X (a2, Be) X +++ XK (ans Bn) 
with each @; and B; elements of the extended real line’ (real line) such 
that a; < 8; and each B € E” if (and only if) A € Py, and det A + 0. 
Some network theoretic implications of these and related results are 
discussed in Refs. 1 and 2, where the matter of determining whether or 
not A € P, or A € P is considered in some detail. 


* Throughout the paper the superscript ¢r denotes transpose. 

** See Ref. 1 for a derivation of the equation within the context of the transistor 
de-analysis problem. 

+ There are some interesting applications of this result in the study of numerical 
methods for solving certain nonlinear partial-differential equations, in which A 
has nonpositive off-diagonal terms and is irreducibly diagonally dominant. 

* The numbers a: and B: are members of the extended real line if —o S a 
Soand—wo SPi S o~. 
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This paper presents a proof of a nonuniqueness result. The proof 
focuses attention on a simple special property of transistor-type non- 
linearities. The result shows that under certain conditions equation (1) 
has at least two solutions for some B € E”. In addition, the paper 
presents some material concerned with the convergence of algorithms 
for computing the solution of equation (1) and proves some theorems 
which provide more of a theoretical basis for the efficient computation 
of the transient response of transistor networks. The remaining portion 
of Section I is concerned with a detailed discussion of the results and 
their significance. 


1.1 An Application of the Nonuniqueness Theorem 


The standard Ebers—Moll transistor model, which is widely used, 
gives rise to functions f;(-) which, while continuous and strictly mono- 
tone increasing, are mappings of EZ’ onto open semi-infinite intervals. 
For such f;(-), the results stated above assert that the equation (1) 
possesses at most one solution x for each B € EH” if A € P, ; and if 
A € P, and det A ¥ 0, then equation (1) possesses a solution for each 
B € E"”. Since, as indicated in Ref. 1, A = T~’G with T a nonsingular 
matrix which takes into account the forward and reverse transistor a’s, 
and G@ is the short circuit conductance matrix of the linear portion of 
the network, the condition that det A not vanish is equivalent to the 
rather weak assumption that the linear portion of the network possess 
an open-circuit resistance matrix. 

It is natural to ask whether the use of more-detailed information 
concerning the nonlinearities of the transistor model would enable us 
to make assertions concerning the existence of a unique solution of 
equation (1) for all B © EH” under weaker assumptions on A. In particu- 
lar, can the condition that A belong to Py be relaxed? The first result 
proved in this paper, Theorem 1 of Section II, shows that if the f;(-) are 
exponential nonlinearities of the type associated with the Ebers—Moll 
model, then the condition that A belong to P, cannot be replaced by 
a weaker condition. More explicitly, in Section II a set 5> of mappings 
of &” into E” is defined, and 5$ contains all of the mappings F'(-) that 
correspond to Ebers—Moll type f;(-)’s. It is proved there that if A € P, , 
then for any F(-) © 5S , there is a B € E” such that equation (1) 
possesses at least two solutions. In fact, it is proved that if A GE P, 
and if 6 is an arbitrary positive number, then for any F(-) € 5 , there 
is a B © E” such that equation (1) possesses two solutions such that 
the distance in £” between the two solutions is 6. 

Thus Theorem 1 together with the earlier results mentioned above 
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concerning existence of solutions show that the set of matrices Py plays 
a quite fundamental role in the theory of nonlinear transistor networks. 


1.2 An Algorithm for Computing the Solution of Equation (1) 


Several results which assert that A € P, under certain conditions on 
the transistor a’s and the short-circuit conductance matrix of the linear 
portion of the network are proved in Refs. 1 and 2. In particular, 
Ref. 1 proves that A € P, and hence that A € P,,if A = P-’Q with P 
and @ realn X n matrices such that for aJl7 = 1,2,---,n 

Pii > > | ps; | and qi; > > | Qi; * 


t¥7 


Theorem 2 of Section II shows that a relatively simple and entirely 
constructive algorithm can be used to generate a sequence gO oes 
of elements of Z” that converges to the unique solution of (1) if A = 
P~'Q with P and Q as defined above and each f;(-) is a continuous (but 
not necessarily differentiable) monotone nondecreasing mapping of 


E’ into E'.** 


1.3 Palais’ Theorem, Existence of Solutions of Equation (1), and Algo- 
rithms for Computing the Solution of Equation (1) 


Reference 1 gives two existence proofs concerning equation (1). One 
proof, the more basic of the two, is based on first principles and em- 
ploys an inductive argument in which, with k an arbitrary positive 
integer less than n, the existence proposition is assumed to be true with 
n replaced by k and it is proved that then the proposition is true with n 
replaced by (& + 1). The second proof uses a theorem of R. 8. Palais 
and requires that the f;(-) be continuously differentiable throughout E’. 
More explicitly, Palais’ theorem? asserts that if R(-) is a continuously 
differentiable mapping of E” into itself with values R(q) for qg € LE’, 
then R(-) is a diffeomorphism‘ of E” onto itself if and only if 


() det J, ¥ 0 for all g € LE”, in which J, is the Jacobian matrix of 
R(-) with respect to gq, and 
(zz) || R@ || @ as || ¢ || > 0.7 


* a is proved also that A € Po if A = PQ with pj; > Liz; | pi; |andq;; = 
Dei | Qez | for all 7. 

** A related result given in Ref. 4is not directly applicable here because of assump- 
tions made in Ref. 4 concerning the existence and boundedness of a certain Jacobian 
matrix. 

+ See Ref. 5 and the appendix of Ref. 6. 

t A diffeomorphism of E£” onto itself is a continuously differentiable mapping of 
E” into E” which possesses a continuously differentiable inverse. 

tt Here || - || denotes any norm on £” 
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And the second proof of Ref. 1 shows that, with 
R@) = F@) + Ag 


for allg € E”, the two conditions (z) and (77) are met when A € P, and 
each f;(-) is a continuously differentiable strictly-monotone-increasing 
function which maps E* onto E’ and whose slope is positive through- 
out E".* 

There are some problems which arise in connection with, for example, 
the numerical solution of certain nonlinear partial-differential equations** 
in which one encounters an equation of the form (1) with A € P, and 
det A +¥ 0, but with functions f;(-) which, while continuously differ- 
entiable, are monotone nondecreasing (rather than strictly monotone 
increasing) mappings of EZ’ into E*. We can prove that even in such 
cases equation (1) possesses a unique solution for each BC E” as follows. 
Here the Jacobian matrix of F(q)+Aq exists and is of the form D(q) +A 
in which D(q) is a diagonal matrix with nonnegative diagonal elements. 
Since A € Py, and det A ¥ 0, we have’ det [D(q) + A] ¥ 0 for all 
q € EK". An immediate application of Theorem 3 of Section II shows 
that || F(q) + Aq||— © as || q||— «.t Therefore, by Palais’ theorem, 
F(x) + Ax = B possesses a unique solution for each B. 

Theorem 3 is of use not only in connection with the proof given in 
the preceding paragraph; it also plays a key role in showing that there 
is an algorithm which generates a sequence of elements of E* x, x, --- 
that converges to the unique solution of F(x) + Ax = B whenever each 
f:(-) is twice continuously differentiable on EZ’ and the conditions on A 
and F(-) of the preceding paragraph are satisfied.* 

More generally, if R(-) is any twice-differentiable mapping of E” 
into itself such that conditions (z) and (zz) of Palais’ theorem are 
satisfied, then, with R-’(-) the continuously-differentiable inverse of 
R(-), 2 = R~*(6) satisfies R(x) = 6 in which 6 is the zero element of EZ”, 
and there are steepest decent as well as Newton-type algorithms each of 


* The reasons that two proofs were presented in Ref. 1, with the second proof 
a proof of a somewhat weaker result, are that the arguments needed for the appli- 
cation of Palais’ theorem had already been developed in Ref. 1 and used for 
other purposes there, and it was felt desirable to indicate an alternative approach 
to essentially the same problem. 

** The writer is indebted to J. McKenna and BE. Wasserstrom for bringing this 
fact to his attention. 

+ More yoru Theorem tf shows that there is a vector C € E” such that 
| Fq@) +4 C\|— © as ||q||— ©, which is equivalent to the statement 
concerning hat F(q) + Aq || made above. 

The differentiability assumption here is introduced as a matter of convenience, 
end is certainly satisfied when the f;(-) are Ebers—Moll exponential-type non- 
inearities. 
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which generates a sequence in E” that converges to x. To show this, let’ 
f(y) = R(y) || for all y € EZ" in which || - || denotes the usual Euclidean 
norm (that is, the square-root of the sum of squares). Since condition (7) 
of Palais’ theorem is satisfied, the gradient Vf of f(-) satisfies (Vf) (y) 
~ 6 unless f(y) = 0,* and since condition (iz) of Palais’ theorem is satis- 
fied the set S = {y C E”: f(y) S f(a} is bounded for any x € E”. 
Therefore we may appeal to, for example, the theorem of page 43 of Ref. 
7 according to which for any x“ € E”, for any member of a certain class 
of MEP per ¢(-) of S into £", and for suitably chosen constants yo , 
v1, , the sequence x, x, --+ defined by 


at? = og 4+ yo(e) forall k= 0 


belongs to S and is such that || R(x || > 0 as k > ©. However, since 
R7*(-) exists and is continuous,’ it follows from 


= R-[R(c™)] forall k= 0 
and the fact that R(t) > 06 as k > o, that lim... 2™ exists and 


lim «“ = R™'(6), 


k-00 


t 
which means that x = lim,.,, 7”. 


1.4 Transient Response of Transistor-Diode Networks and Implicit Nu- 
merical-Integration Formulas 


At this point we briefly consider some aspects of the manner in which 
the previous material bears on the important problem of providing 
more of a theoretical basis for numerically integrating the ordinary 
differential equations which govern the transient response of nonlinear 
transistor networks. Although we consider explicitly only networks con- 
taining transistors, diodes, and resistors, the material to be presented 
can be extended to take into account other types of elements as well. 
In addition, we shall focus attention on the use of linear multipoint 
integration formulas of closed (that is, of implicit) type, since such 


* Here we have used the fact that (Vf)(y) = 2J,‘*R(y) for all y € E.7 

+ By Palais’ theorem R(-) is a diffeomorphism of EH” onto itself. 

+The material of the second part of Section 1.3 was motivated by previous 
recent work of the writer’s colleague A. Gersho who made the observation that the 
convergence of an algorithm for the solution of equation (1) could be shown by 
combining results of Ref. 1 with the approaches described by Goldstein.? (See the 
November 1969 BS.T J. Brief by A. Gersho.) 
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formulas are of considerable use in connection with the typically ‘‘stiff 
systems” of differential equations encountered. 

A very large class of networks containing resistors, transistors, and 
diodes modeled in a standard manner is governed by the equation® 


E + TFIC%U)] + I+ GR"GC*U) = BO, t20 6) 


where, assuming that there are g diodes and p transistors, 


@ T=I,QOT,0T.@--- @T,, the direct sum of the identity 
matrix of order g and p 2 X 2 matrices T,, in which 


__ tk) 
T, = : = 
| 


with 0 < af” <1land0 < a® < 1 fork = 1, 2,---,p. 

a) R=R@OROR.@---@R,, the direct sum of a diagonal 
matrix Ry = diag (7, , 7%, -°-: , 7%) With r, = O fork = 1, 2,--- ,q 
and p 2 X 2 matrices R, in which for all k = 1, 2,--- ,p 


(k) (k) (k) 
fe ety rs 
R, = 
(k) (k) (k) 
Ts eae ae 


with ri? = 0, 7; = 0, andr” 2 0. (The matrix R takes into account 
the presence of bulk resistance in series with the diodes and the emitter, 
base, and collector leads of the transistors.) 

(iit) G is the short-circuit conductance matrix associated with the 
resistors of the network. (It does not take into account the bulk re- 
sistances of the semiconductor devices.) 

(iv) F(-) is a mapping of E°*® into E°?*® defined by the condition 
that 


F(x) = [f. (a1), fa(x2), Yon fsoatosGa)| 


for alla € E°?*® with each f;(-) a continuously differentiable strictly- 
monotone increasing mapping of H' into E’. 

(v) C71(-) is the inverse of the mapping C(-), of E°?*® into itself, 
defined by 


C(x) = diag (c; 92, °°" y Conve) t oie diag (71 9T2, °°" 4 Top+a) i (2) 


for alla € E°?*® with each c; and each 7; a positive constant. 
(vt) Bt) is a (2p + q)-vector which takes into account the voltage 
and current generators present in the network, and 
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(vii) u is related to v the vector of junction voltages of the semi- 
conductor devices through C(v) = u for allv € E°?*®, 
Equation (3) is equivalent to* 


u+ f(u, t) = O(2p+0)) t20 (4) 

in which of course 
te aN mmrcy-ls Nt 1 MD\-1lAN—17.\. N/a ms 
J(u) = TRIC W)j + GZ + GR) GC (Wu) — BY (v) 


and 6(25+9) is the zero vector of order (2p + q). 
It is well known that certain specializations of the general multi- 
point formula’’”° 


Yati = >> QYn—k h pa biGn—k (6) 
k=0 =—1 
in which 
Un-. = —flYn-x ’ (n = k)h) (7) 


can be used as a basis for computing the solution of equation (4). 
Here h, a positive number, is the step size, the a, and the b, are real 
numbers, and of course y, is the approximation to u(nh) for n = 1. 

In the literature dealing with formulas of the type (6) in connection 
with systems of equations of the type (4), information concerning the 
location of the eigenvalues of the Jacobian matrix J, of f(u, t) with 
respect to u plays an important role in determining whether or not a 
given formula will be (in some suitable sense) stable. In particular, an 
assumption often made is that all of the eigenvalues of J, lie in the 
strict right-half plane for all ¢ = O and all w. For f(u, ¢) given by equa- 
tion (5), we have 


re — Het 
J, = T diag is + 7,Filgi(us)] 


7 : 1 
a3 R)’G diag roe 
AE TO) GOE e; + tif ilgiu;)] (8) 
in which for 7 = 1,2, --- , (2p + q) g;(u;) is the j* component of C~*(u). 
Thus here J, is a matrix of the form 
TD, + I + GR)"GD, (9) 


where D, and D, are diagonal matrices with positive diagonal elements. 


* Ref. 8 shows that if B(-) is a continuous mapping of [0, ©) into E@?te), then for 
ay initial condition u© € HCrt® there exists a unique continuous (2p ++ 4)- vector- 
ued function u(-) such that u(0) = wu and (8) is satisfied for allt > 0 
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A simple result concerning equation (9), Theorem 4 of Section II, 
asserts that if there exists a diagonal matrix D with positive diagonal 
elements such that* 


(t) DT is strongly column-sum dominant, and 
(ii) DI + GR)'G is weakly column-sum dominant, 


then for all diagonal matrices D, and D, with positive diagonal ele- 
ments, all eigenvalues of (9) lie in the strict right-half plane. This con- 
dition on 7’, G, and R is often satisfied.t 

The subclass of numerical integration formulas (6) defined by the 
condition that b_; > 0 are of considerable use’’’**’** in applications 
involving the typically “stiff systems” of differential equations en- 
countered in the analysis of nonlinear transistor networks. With b_, > 0, 
Yn+1 18 defined implicitly through 


Gray hob_sf(Yn+1 (n+ 1)h) = > AxYn-e +h oS DeGn-% 
k=0 k=0 


in which the right side depends on y,-, only for k € {0, 1, 2,--- , 7}, 
and for f(u, £) given by equation (5), we have 


Yner + Ab-{ TFC Yns1)] + LE + GRY GC Ynss)} = dn (10) 


in which 


an = ds QxYn-k -¢ h > biGn—k + hb_,Bl(n + 1)h]. 


Obviously, the numerical integration formula (10) makes sense only 
if there exists for each n a Yns, © E°?*® such that equation (10) is 
satisfied. 

Let tsi = C7'(Yns1) for each n. Then equation (10) possesses a 
unique solution y,41 if and only if there exists a unique z,,, © E°?*® 
such that 


C(Sn41) + ADT (@as1) + I + GR)" Gaya1] = Ge - (11) 
Since C(@ai1) = C¥ne1 + TF (tas1), in which 
> diag (c, »f2 5 °°" ) Copa) 
and 


7 = diag (7, , 72, 8% y Tapta)) 


*The terms “strongly-column-sum dominant” and “weakly-column-sum domi- 
nant” are reasonably standard. However they are defined in Section II. 
+ See Ref. 8 for examples. 
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equation (11) is equivalent to 
[7 + Ab_, TIF (tae1) + [e + Abi + GR) "Gans: = qn. (12) 


The matrices + and c are both diagonal with positive diagonal ele- 
ments. Thus it is clear that for all positive h 


det [r + hb_, T| ~ 0 


and 


det [c + hb_,(I + GR)7'G] # 0.* 
For all sufficiently small positive h 
[7 + hbiT) [ce + hb_i.U + GR)"G] € P, .t 


Consequently? for all sufficiently small h > 0, equation (12) possesses 
a unique solution for each q, .* However, our interest in equation (12) 
is primarily in connection with “‘large-h”’ algorithms. 

Suppose that det G ~ 0 and that 7~’G © Py for all possible combi- 
nations of a, and a, (0 < a, < 1,0 < a, < 1) for each transistor (see 
Ref. 1 for examples). Then, according to Theorem 6 of Section IT, 
for any particular T and R 


[7 + hb_,T]""[c + hb_,ZI + GR)~’G] = any 


for all h > 0, and hence equation (10) possesses a unique solution 
Yn+1 for all positive values of h. 

An important and general proposition concerning (10) is as follows. 
Suppose that 


PG Gk) Ger, (13) 


= ’ are replaced 


and that condition (13) is satisfied whenever ea,” and a}* 

with positive constants 6‘ and 6(”, respectively, such that 6“ < a\® 
and 6}” < at” fork = 1, 2, --+ , p. In other words, assuming that F'(-) 
is as defined in this section and that F(-) € ¢2?*® (see Definition 1 of 


Section 2.1), suppose that the de equation 
F(x) + T [UT + GR) 'G)cz = B 


possesses at most one solution x for each B € E”?*” for “the original 
set of a’s as well as for an arbitrary set of not-larger a’s.’”’ Then an 


* Here we have used the fact that (J + GR)"!G is positive semidefinite. 

t See Section 1.2. 

tt See Section 1.3. 

+ Alternatively, this conclusion could have been obtained by applying the con- 
traction-mapping fixed-point principle to (10), in view of the fact that each of the 
elements of J. is bounded on u € #@t® andt € [0, ©). 
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immediate application of Theorem 5 of Section II shows that 
[r + hb_, T]""[c + hb_,(I + GR)7'G] EC Ps 


for all h > 0, and hence that equation (10) possesses a unique solution 
Yn+1 for allh > O and all g, © E°?*®. 


Il. THEOREMS, PROOFS, AND SOME DISCUSSION 


Throughout this section, 


(z) n is an arbitrary positive integer, 
(7) Po denotes the set of all real n X n matrices M such that all 
principal minors of M are nonnegative, 

(ii7) real Euclidean n-space is denoted by ”, and @ is the zero 
element of EZ”, 

(tv) v‘” denotes the transpose of the row vector v = (v1, V2, °** 5 Un), 

(v) || v || denotes (>°*_, 0?)'” for allv € E’, 

(vi) if D is a real diagonal matrix, then D > 0 (D 2 0) means that 
the diagonal elements of D are positive (nonnegative), 

(vit) I, denotes the identity matrix of order gq, and I denotes the 
identity matrix of order determined by the context in which the symbol 
is used, and 

(vizi) we shall say that a real n X n matrix WV is strongly (weakly) 
column-sum dominant if and only if for7 = 1, 2, --+ ,n 


m;; > (2) > | mi; |. 


2.1 Definition 1 
For each positive integer n, let §> denote that collection of mappings 
of E” into itself defined by: F € & if and only if there exist for 7 = 


1, 2, --+ , n, continuous functions f;(-) mapping E’ into EZ’ such that 


for each x = (21, %2,°°+, Un)" € EB’, F(x) = [f, (a1), fata), +++, fn(tn)]”, 
and 


(2) int [hile + 8) — file] = 0 
(22) an sae [f(a + 8) — fia)] = + 


for all@ > Oandall7 = 1,2,--- ,n. 


2.2 Theorem 1 


Let F € 5 ¢ , let A be a real n X n matrix such that A EG Py , and let 
5 be a positive constant. Then there exist BC E", x € EB", andy € E" 
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such that 
(2) F(x) + Ax = B, 
(22) Fly) + Ay = B, 
and 
(tit) lx — yl] = 6 


2.3 Proof of Theorem 1 


Since A € P, , there exists’ a real diagonal matrix D > 0 such that 
det (D + A) = 0. Thus there exists a x* € EZ” such that || x* || = 6 
and (D + A)a* = 6. 


Since F € S$ , there exists a x € E” such that 
fi@i) — fia; — 24) = x4 d; 
for all j = 1, 2, --+ , n in which d; is the j diagonal element of D. Let 
B= F(«) + Ag, 
and let y = x — x«*. Then A(z — y) = Ax* = —Dzx*, and 
F@) —-Fy)+4@-y=6 0 
2.4 Remarks Concerning Theorem 1 


If, as in the case of standard transistor models, 
fits) = et — 1 
or 
Leja ee 
with \; > 0, we have, respectively, 
fila + B) — fila) = e'*(e*” — 1) 
or 
Gere) = fle = er Cae) 
and it is clear that for either type of function conditions (z) and (iz) 
of Definition 1 are satisfied. 
In Ref. 1 it is proved that if F(-) € I the set of all F(-) of the form 


(2) with each f,;(-) a strictly monotone increasing mapping of E’ into 
E’, and if A € P, , then equation (1) possesses at most one solution. 
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Thus, using Theorem 1, we see that for each F(-) € M1) § there 
exists at most one solution of F(z) + Az = B for each B € E” if and 
only if A € P,. Similarly, with 9M the set of all F(-) of the form (2) 
with each f;(-) a strictly monotone increasing mapping of E’ onto 
(a; , 8;) with each a; and 8; such that —~ Sa; < 6; S ~,if F(-) € 
Wo (\ 5 and det A + O, then there exists a unique solution z of 
F(z) + Az = B for each B € EL” if and only if A € P,. (The “if” part 
of this statement is proved in Ref. 2.) A parallel development can be 
carried out for equations of the form AF(x) + x = B with A a real 
n X n matrix, F(-) € WW O\ 55, and B € E”. More explicitly, we can 
prove that if F(-) € 0% ( §}, then there exists a unique solution x of 
AF (x) + 2 = B foreach B € E’ if and only if A € P,. 

There may be a temptation to conjecture that whenever F(-) € 91 
s> and A €& P, then the equation F(x) + Ax = B does not possess a 
solution for some B € EK”. The conjecture is false. In fact, with n = 2, 
fila) = e”, fo(a2) = e”*, and 


a-|' al 
1 0 


we have a situation in which (it is easy to show that) there exists a 
solution for all B € E’. Of course here for some choices of B the solution 
is not unique. 


2.5 Theorem 2 


Let P and Q denote real n * n matrices such that 
Dii > 2 | pss | and qj; > 2D | gs | 


for allj = 1,2,---,n. Forj = 1,2,--- ,nletf;(-) denote a continuous 
monotone nondecreasing (but not necessarily differentiable) mapping of 
E' into itself, and let F(x) = [fi(a1), fo(t2), --- » fa(ta)]** for all x € EH". 
Then for each R € E”, there exists a unique x € Ei” such that 


PF(@) + Qz = R, 
and, for any yo © E", x is the limit of the sequence x, x, - ++ defined by 


ys cs DpF(2™) =. Dar” 
yo? + (P — Dr)F@™) + Q — Dale” = R- 


for n = 0, tn which Dp and Dg are diagonal matrices whose diagonal 


elements coincide with those of P and Q, respectively. : 
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2.6 Proof of Theorem 2 


Since the continuous mapping [DpF’(-) + Dg] of E” into E” possesses 
an inverse [DpF(-) + Dg]’, the equation 


PF(«) + Qc =R 


possesses a unique solution x if and only if y = DpF(x) + Doz is the 
unique solution of 


y + PF((D?F(-) + Do) *y] + U(DrF(-) + Da) ty] = RB 


in which P = (P — Dp) and Q = (Q — D,). 

Therefore, by Banach’s contraction-mapping fixed-point theorem, 
it suffices to show that with the metric p(y, z) = >07., | y; — 2; |, the 
operator H defined by 


Hy) = PF((DpF(-) + Do)"y] + Q(DrF(-) + Do)“yl 


for all y € £”, is a contraction mapping of £” into itself. We show this 
as follows. Let y € E” and z € E”. Using the fact that 


a = dol (dpifi(-) + dos) a) + desfil@eifi(-) + do;)7 el] 


for all real a and allj = 1, 2, --- , n, in which dp; and dg; is the j* 
diagonal element of Dp and Dg , respectively, it is a simple matter to 
verify that for all j: 


fil(deifi(-) + dai) Yi] — Fil(drsfi(-) + doi) 25] 


r 


== des ac dpi; (y; a 2), 
and 


-1 = 1 
(dpif(-) + dai) ys — eifil-) + da:) 2; = do; + dpjr; (yi — 2;) 


in which r; = 1 if y; = z; , and, if y; 4 2,, 


pres fl(dp Ji) + doi) ‘Yi] — fildrgfi(>) + doi) Zi) 
, (dp.J,(-) a dai) ‘Ys a (dp;f,(-) = do;) 2; 


Thus 
Hy) — H@) 


i T; Fy js fee WE 
= P diag a} —z)+Qdiag i ae oh — 2) 
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in which r; 2 0. Therefore 
< Gai + zeit) 
pllL(y), H@)) $ max (22+ 2) py, 2) 
in which Cqi = ai | Qi | and Op; = rr | Dii |. Since Tai < da; 
and op; < dp; for all j, there exists a positive constant 8 < 1 such that 
Tai + zeit) < 
Poe ($2 + dp} ~ 
forallr; 20. O 


2.7 Theorem 8 
If A € P, and det A ¥ 0, tf for each j = 1, 2, --- , n:f;(-) 7s @ con- 
tinuous mapping of E* into itself such that 


f;(2;) = 0 for all v; 


or 
f;(2;) > 0 for all v; > Cc 
and 
f;(7;) <0 foralla; < —c 
for some c 2 O, then, with F(x) = [fi(a1), fo(te), +++ , fa@n)] for all 
ze &", 


| F@) + Az|[>@ as |[a||—> . 
2.8 Proof of Theorem 8 
We note that 
| P@) + Az|[> 2 as ||r||> @ 
if and only if 
|| ATF@) +2|| > as |[r|| >. 
With M = A™’, let 
MF (xz) + x = q. (14) 


Since A € P,, we have M € P, .’ Since M € P, , we have’ for any 
y € BE” andy # 6 


yx(My), 2 0 
for some index k such that y, ~ 0. 
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Suppose that F(x) # 6. Then there exists an index k, such that 
fi, (tr, [MF (x)],, 2 0 
with f;,,(a,) # 0. Thus, using (14), 


fu. (We) IMF (@) es + fers Cis) es = Ses (Crs) Qe, 


and 


fer (Cei)€e, S fer (Cede - 


Hither x,, © [—c, c] or not. If not, then f,,(t,,.)%,., > O and | a,, 
| gz, |. Therefore for some index k, , | xz, | S 6, 4 max (c, | gq, |), whether 
or not F(x) = 6. 

Let M“” denote the matrix obtained from M by deleting the k, 
row and column, and let M(.,) denote the k, column of M with the k, 
entry removed. Similarly, let x.) , Ga.) and Fa,)(@,)) denote the 
(n — 1)-vectors obtained from 2, q, and F(z), respectively, by removing 
the k, entry. Then 


M9 Fa.) (to) + 2a.) = Ua) M x.y fir (Ces) 


Since M“” € P, , we can repeat the argument given above. Thus 
there exists an index k, , different from k, , such that 


< 








| Vig | S 6, 4 max (c; | Oks ska) |) 


in which 
| Qck:.k2) | = Max | [Gas — Mf, (tn.) Vt. | 
Irz, 188, 


and J, is the index of the component of x,,,) that corresponds to the k, 
component of x. By continuing in this manner we can determine positive 
constants 6, , 62, °+: , 6, depending only on q, F’, M, and ¢ such that, 
with 6 = max; {6;}, 


[z,|S 6 forall j= 1,2,---,” 
and each 6; depends on g such that for any positive constant a, there 
exists a constant 6;(a) with the property that 6; S 6;(«) provided that 
\| ¢ || S a. Therefore for any a > 0 there is a 8(a) such that || x || < 8(a) 
whenever || g || S @, which implies that || q |] — © as |[a||— ©. O 
2.9 Theorem 4 


Let P and Q denote real n XK n matrices with P strongly column-sum 
dominant. Suppose that there exists a real diagonal matrix D > 0 such 
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that DP is strongly column-sum dominant and DQ ts weakly column-sum 
dominant. Then for all real diagonal matrices D, > 0 and D, > 0, all 
eigenvalues of (PD, + QD.) lie in the strict (that is, open) right-half 
plane. 


2.10 Proof of Theorem 4 


Since the strict right-half plane contains all of the eigenvalues of P, 
there exists choices of D, > 0 and D, > 0 such that every eigenvalue 
of (PD, + QD,) lies in the strict right-half plane. Thus it suffices to 
show that (PD, + QD,) does not possess an eigenvalue on the boundary 
of the complex plane for all D, > 0 and all D, > 0. In other words, it 
suffices to show that (with ¢ = (—1)*) 


PD, + QD, + twl (15) 


is nonsingular for all D, > 0, all D, > 0, and all real constants w. 
Suppose that (15) is singular for some w and some D, > 0 and some 
D, > 0. Then (DPD, + DQD. + iwD) is singular. But DPD, is 
strongly column-sum dominant and DQD, is weakly column-sum 
dominant. Thus M = (DPD, + DQD.) is strongly column-sum domi- 
nant, and, since 
| mj; + tod; | > D2 | mi; | 
ix 
for all 7, in which d; is the j diagonal element of D, it follows that 
det (M + twD) # 0, which is a contradiction. 0 


2.11 Definition 2 


With q and p nonnegative integers such that (p + q) > 0, let 3 de- 
note the set of all matrices M such that MW =7I,Q0M,@M.@-::: 


@ M, with 
— « 1 
and 
Oe <i 
Ox eP <1 
for allk = 1,2,--- , p.* 


* As suggested, if g = 0, then 7 =M:1QM2@---@®M,, while if p = 0, 
then Mf = I. 
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2.12 Definition 3 


With g and p nonnegative integers such that (p + q) > 0, let 3(a) 
denote the set of all matrices M such that M=I,@® M,@M.@-:: 
@ M, with 

__ glk) 
M, = 1 5° | 
o(h) 1 
L ~~ Uf 1 J 
and 
0 < 5 < al” 
0< 5 S af” 
for all K = 1, 2, --- , p.* 


2.13 Theorem & 


Let T € 3, let H be a real matrix of order (2p + q), and suppose 
that M~"H € P, for all M € 3(a). Then 


(T + D,)""(H + D2) © Po 
for all diagonal matrices D, = 0 and D, = 0. 


2.14 Proof of Theorem & 
Suppose that for some D, = 0 and D, = 0 
(T + DD)" + D2) € Po. 
Then there exists” a diagonal matrix D > 0 such that 
(T + D,)"( + D,) + D 
is singular. It follows that 
H+A+TD 
is singular, in which A = D, + D,D. Since 
A+ TD = M(A+D) 
in which M € 3(a), it follows that 
H+M(A+D) 


is singular, and therefore that 


* As suggested, if g = 0, then M =M,@M2@::-@®Mpz, while if p = 0, 
then M = Ij. . 
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MH + (A+ D) 


is singular, which is a contradiction since’ M7’ H € P, and (A+ D)isa 
diagonal matrix with positive diagonal elements.’ 0 
2.15 Theorem 6 
Let M~’*G € P, for all M € 3, and let det G ~ 0. Let R be as defined 
in Section 1.4. Then for any T € 3 
(T + DTT + GR)“G + Dj] € Po 


for all diagonal matrices D, = 0 and D, 2 0. 


2.16 Proof of Theorem 6 


Since det G ¥ 0 and M~’G € P, for all M © 3, it follows (see the 
proof of Theorem 7 of Ref. 2) that 


M“(I + GR)’GE P, 
for all M € 3. 
Suppose that for some 7 € 3 and some D, = 0 and D, 2 0 


(T + D,)"[C + GR)"G + Di] € Po. 
Then, following the proof of Theorem 5, we would have 
det {M-*(I + GR)'G + (A+ D)} = 0 


for some AZ € 3 and some diagonal matrix (A + D) with positive 
diagonal elements, which is a contradiction. 0 
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A Charge Control Relation for 


Bipolar Transistors 


By H. K. GUMMEL 


(Manuscript received August 1, 1969) 


We give a relation which links emitter and collector junction voltages, 
V., and V., , collector current I, , and the total charge Q, of carriers that 
enter through the base terminal (electrons in a pnp transistor): 


é —€ 


Qs 


This relation is valid for high injection conditions, subject only to minor 
restrictions. The significance for device modeling is discussed. 


(aVeb/kT) (aVcb/kT) 
I, = const . 


I. INTRODUCTION 


A basic concept of charge control theory is that the controlled current 
(collector current) equals controlling charge (base charge) divided by a 
transit time.’ This paper presents an additional relation which links 
base charge and collector current with junction voltages. The validity 
of this relation is subject only to minor restrictions. When these are 
met the relation holds even under high-level injection conditions. 

Section II presents a derivation of the new charge control relation. 
Equation (15) states the principal result. The discussion in Section III 
points out the significance of this relation for bipolar transistor models. 


II]. DERIVATION 
Consider a one-dimensional transistor of pnp polarity. The hole 
current density is given by 
jp = quip — qDp’ (1) 


where the symbols have their customary meaning. We shall assume that 
diffusivity D and mobility yu are related through the Einstein relation: 


115 
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Approximation (a): 


k 
D=— 
q Lb 
Approximation (b,): It is assumed that electric fields are low enough 


for avalanche multiplication of carriers to be negligible. 


Approximation (b,): ‘lhe velocity-field relation is idealized by the 
field dependent mobility expression: 


Mo 
= 1 + elzl 

where uo = gD,/kT is the low-field mobility, considered for convenience 
independent of doping, and where », is the scattering limited velocity. 
Approximation (b;) places an upper limit on allowable bias. It is known 
(see for example Ref. 2) that D is underestimated at high fields by 
approximations (a) and (b,) and that approximation (b,) yields too 
gradual a transition from low field velocities to the high field saturated 
velocity.*'* Nevertheless, approximations (a) and (b) afford significant 
simplifications in the treatment to follow and are retained for that 
reason. To the extent that our final result, equation (15), is affected 
by them, it must be considered approximate. The errors depend on 
bias and doping profile; they are not expected to exceed a few percent 
for typical situations. The error due to approximation (b6,) overempha- 
sizes velocity saturation effects and may be alleviated by choice of 
values of v, larger than the final saturation value in high field regions. In 
high-field regions the current is carried predominately as drift current, 
with a carrier concentration that is nearly constant in such regions, so 
that errors in D are of minor consequence. 

Next we define a quantity a(x) which is the ratio of the hole current 
density at position x to the current density j, leaving the collector 
terminal 


a(x) (2) 


| _. (2), 
Je 
For direct current conditions, considered here, a approaches unity at 
the collector and is 1/a = (1 + 8)/6 at the emitter. For large common 
emitter current gain 8, a differs negligibly from unity. 
We use 


B= (3) 
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where y is the electrostatic potential in units of the Boltzman voltage 
kT/q, and consider equation (1) as a differential equation for p(x). Its 
solution, when p is specified at a point 2, , is 

p(t) = p(x,)e" eP-¥ = oD. a(t)e’P-¥ dt 


=. es, 1 | pW v2) 
de [ a(t) | ¥(i)’ Je dt. (4) 


Equation (4) is valid for any pair of points x, and x. Denote by zz and 
cq the outside edges of emitter and collector transition regions, and 
use equation (4) with x, = zz and x = x; . Multiplication of equation 
(4) by e*™ and use* of 


(2) = ner7 (5) 
na) Se? ee (6) 


where ¢, and ¢, are hole and electron quasi-fermi levels in units of the 
Boltzman voltage, yield 


; 


E 


gD ni(e???® = QP ?(t c)) 


z (7) 
: a(t)n,e"'? dt + ms [ : a(t) | W'() | e% dt 


je = 


We shall now show that the second term in the denominator is negligible. 
The integrals in the denominator obtain the largest contribution from 
the region near z,, where (x) attains its maximum value y,, . If in the 
second integral we replace a(t) by its value a,, at x, , and if we neglect 
e*®) and e**© in comparison with e’"—all very reasonable assump- 
tions—we obtain for the second integral in the denominator 


Approximation (c): 


Dats f a(t) | p(t) |e”? dts ate ev", 

For an assessment of the relative magnitude of the terms in the 
denominator of equation (7), consider that in a region of width w the 
potential y(x) does not differ markedly from y,,; such region is con- 
ventionally called the “base” of the transistor. Consider high current 
gain, that is, a a, = 1. Then the value of the first integral is wn,e””, 
compared with (2D,/v,) ner” for the second. The quantity 2D,/v, has 
units of length and is ~ 200 A for silicon. This length is small compared 


* Equation (6) is defined for later reference. 
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to base widths of today’s most advanced transistors and hence we will 
neglect the second term in the rest of this paper. Conceivably, future 
transistors may have narrow enough bases that the term will have to 
be kept. 


Approximation (d): 


n 


D. pte pes rte oe 
“Se fal) |W le dt] a(dne” dt. 
Us cf ztE 


If in equation (7) we were to let v, — ~, that is, considered the 
carrier velocity to be strictly proportional to the electric field, then 
approximation (d) would be implemented automatically. Note, how- 
ever, that in making approximation (d) we do not imply v, = ©, nor 
do we neglect essential consequences of the finiteness of v, . A low 
value of v, manifests itself in substantial base widening at high currents, 
that is, in influencing y(¢) in the remaining (first) term in the denomi- 
nator of equation (7). In view of the idealized textbook treatments in 
which the minority cartier concentration at the base side of the collector 
depletion region is set equal to zero, rather than to a finite value, the 
following statement of equation (7) may be of interest: For low injection 
(that is, for currents sufficiently low that the base width is independent 
of current) the effect of the finiteness of the scattering limited velocity 
on the de collector current is equivalent to a base widening of 2D,/», . 


We now make the approximation: 
Approximation (e): 


The value of the electron quasi-fermi 
level in the base is constant. 


A gradient in the electron quasi-fermi level in the region where electrons 
are majority carriers would cause appreciable electron current to flow; 
for transistors of reasonable current gain, such currents are negligible. 
Thus, approximation (e) is very reasonable. We denote this value of the 
electron quasi-fermi level by ¢,, and divide numerator and denominator 
of the right side of equation (7) by exp (¢,.). We define the emitter-base 
and collector-base junction voltages by 


Vice 7 ONCE eee 8) 


eS Go Saal. (9) 
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These voltages differ from terminal voltages by ohmic drops, primarily 
lateral ohmic drops in the base region. The first integral in the denomi- 
nator of equation (7), after it is divided by exp (¢,,), contains very 
nearly the total area density of electrons. 


Approximation (f): 
: n,alt(er oe) dt = fl a(t)n(t) dé. 


The integrands outside the base region differ, since there the quasi- 
fermi level is position dependent and does not equal ¢,, , but the contri- 
bution to the integral outside of the base region is negligible. By de- 
fining an average value (a),, of a, 


i ** a(in(é) at 
(@),, = 2. (10 
/ n(t) dt 
zR 
We may write expression (f) as 
ni im ate" '" di = — ‘Gs G (11) 


where q, is the total charge, per unit area, of those mobile carriers 
associated with the base terminal, that is, electrons in a pnp transistor. 
Equation (7) with approximations (d) and (f) may be written 


_ (FD Mian _ yeas Aleit 


q m2 


jc = 
We now change from current and charge densities to current and charge. 
We chose the sign of the collector current according to the convention 
that an electric current entering the device is positive: 


I, = —j.A (13) 
Q = HA (14) 


where A is the device area. Note that the sign of Q, is such that an 
electric current flowing into the base tends to increase Q, . This is the 
proper sign for charge control theory. Equation (12) can now be written 


I, = ey ee (15) 


gla es/kr) — plaver/kT) 
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with 
(qn; A)’ D, 
C= 4. 
(@)av 


Equation (15) is the principal result of this paper. Note that Q, de- 
pends on bias, and that the form of the bias dependence is governed 
by the doping profile. However, the relation among the quantities 
I,, Ves, Ves , and Q, in equation (15) is independent of the details of 
the doping profile, provided that assumptions (a) through (f) are valid. 


(16) 


III. DISCUSSION 


In spite of the simple appearance of equation (15)—indeed because of 
it—it provides a powerful tool for transistor modeling. It may be 
written in the form 


ve = seq (et On” _ 1] ote Glee? _ 1] (17) 
with 


Cc 
Q22 = —Agq, = Q, (18) 


which is the form of one of the Ebers—Moll equations.” But whereas in 
the Ebers—Moll equation the coefficients a., and a,, are constant, they 
depend according to equation (18) on bias through the base charge Q, 
(C depends on bias only through (a),, which for 8 > 1 is nearly unity 
and varies little with bias). It is this bias dependence of Q, which con- 
tains high-injection effects. Thus, use of equation (15) holds promise for 
transistor modeling of improved accuracy. The major bias dependence 
of the collector current is through the exponentials in the numerator 
of equation (15). These are “‘ideal’’ exponentials (unity emission coeffi- 
cients) and involve no approximations. The actual modeling is now 
done on Q, in the denominator. A bipolar transistor model using this 
approach will be presented in a later paper. 
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Rain Attenuation and Radio 


Path Design 


By C. L. RUTHROFF 


(Manuscript received August 12, 1969) 


This paper describes the application of the rain attenuation theory of 
Ryde and Ryde to the design of Radio Systems. It shows that an upper 
bound on the outage time due to rain attenuation can be computed from a 
measured point rain rate distribution. The paper also describes a suttable 
rain gauge. 


I. INTRODUCTION 


Heavy rainfall on a radio path absorbs and scatters power transmitted 
at frequencies above 10 GHz and causes large fading of received signals. 
At 20 GHz, for example, the attenuation due to a uniform rain rate of 
100 mm/hr is about 10 dB/km. Rain attenuation is so severe at these 
frequencies that for some applications transmission paths must be re- 
stricted to a few kilometers or less rather than the tens of kilometers 
common at lower frequencies. Since the cost of a radio system increases 
with the number of repeaters it is important to use the longest path 
allowed by the transmission objectives. This path length can be deter- 
mined accurately only if the fading outage due to rain attenuation can 
be predicted. 

Bussey estimated fading statistics on a microwave path from point 
rain rate data.’ He used the rain attenuation theory of Ryde and 
Ryde’~* to convert rain statistics to fading statistics and since 1950 his 
results have been used in the design of radio systems.” However, as 
operating frequencies increase and path lengths get shorter, increased 
precision of fading estimates is required for optimum radio system 
design. 

Over the years a number of experiments have been performed in 
which attenuation was measured on a path at specific times and com- 
pared with values computed from rain rates measured by rain gauges 
spaced along the path near ground level. Here too, the theory of Ryde 
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and Ryde was used. In general there is wide disagreement between com- 
puted and measured values. Also, Medhurst questions the validity of 
the application of the theory to a practical rainfall situation.° Since 
the theory was derived for uniform rain the question is a good one— 
rain on a microwave path is seldom uniform. 

In this paper the theory of attenuation by uniform rain is applied to 
the practical rainfall situation. The radio path is defined as the volume 
of the first Fresnel zone and the rain attenuation is assumed proportional 
to the number of raindrops in this path volume. Of course, this expres- 
tion reduces to that of Ryde and Ryde when the rain is uniform. 

Rain rate is a vector which can be written as the product of a rain 
density and the velocity of raindrops. Rain density is proportional to 
the number of raindrops per unit volume. 

Since rain rate is a vector the expression for attenuation in terms of 
rain density in the path volume can be transformed by the divergence 
theorem into an expression for attenuation as a function of the rain rate 
on the surface of the path volume. From this formulation the following 
results emerge: 


(<) A natural definition of rain rate which is appropriate to the radio 
situation. 

(77) A time interval, 7, , exists during which no significant fade can 
occur. 7’, is determined by the path length, the frequency of operation 
and the speed of raindrops. 

(iit) A rain gauge is described which is suitable for measuring rain 
rate in accordance with the definition mentioned in (2). 


By applying these results, an upper bound on the outage time due to 
rain attenuation is derived. The bound can be computed from a meas- 
ured point rain rate distribution using the results of uniform rain 
theory. The bound can be made tight by the proper choice of rain rate 
integration time interval. A method of estimating this interval from 
measured path loss distributions is given. 


II. GENERAL CONSIDERATIONS 


2.1 The Radio Path 


The radio link consists of two narrow-beam antennas pointing directly 
at each other over a distance of a few hundred to a few thousand meters. 
The space, or volume, of the path is taken to be the first Fresnel zone.’ 
This means that only the energy confined to that volume contributes 
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significantly to the total energy collected by the receiving antenna. 

The first Fresnel zone is a long, thin, prolate ellipsoid of revolution. 
For a path of length Z at wavelength \, it has a major axis Z and equal 
minor axes (AL)? and is terminated at the ends by the antennas. The 
radio path is defined as the volume enclosed by the first Fresnel zone 
and the two antennas. Figure 1 is a sketch of the path. When we speak 
of rain falling on the path we mean rain falling through this volume. 


2.2 Rain Rate as a Vector 


A theory of rain attenuation has been formulated by Ryde and 
Ryde,”* and others, and a good account of it is given by Medhurst.° 
The attenuation in a radio path depends upon the number and size of 
the raindrops and not explicitly upon their speed or direction. But the 
quantity usually measured is rain rate and it does depend on the speed 
and direction of the raindrops. Since rain rate is the product of a density 
and a velocity, it can be interpreted as a vector. 

Let there be a uniform distribution of Np drops of water per cubic 
centimeter in the space between two antennas. The drops are spherical 
with diameter D and velocity vp . The fraction of volume occupied by 
water is defined as the rain density 


pp = 5 No Dd. (1) 


Rain density is a dimensionless, real, nonnegative quantity. The rain 
rate for drops with diameter D and velocity vp is 


Rp = ppp. 


The direction of the rain rate is the direction of travel of the drops. 


ANTENNA ANTENNA 





Tig. 1— Radio path. 
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The vector expression for rain rate is, therefore, 
Rp = ppVp , (2) 


where a boldface letter denotes a vector quantity. 
In general a rain storm has drops of many diameters and the total 
rain rate is a summation over the drop diameters present. 


R = 2 PoVo.- (3) 
2.3 Attenuation and Rain Density 


For a rain consisting of uniformly distributed drops with diameter D 
the attenuation of radio waves with wavelength } is® 


Nypn 
Qa 





Attenuation = 4.343 Ap X 10°dB/km (4) 


where A> is a function of the drop diameter, the wavelength, and the 
dielectric constant of water. Substituting from (1) the attenuation is 


ap = ka, D)Lpp dB, (5) 
where 
2 
k(d, D) = 3 X 4.343 *A2 x 10°. 
a D 


The result in (5) is extended to the case of nonuniform spatial distri- 
bution of raindrops by replacing the uniform rain density in (5) with 
the average rain density in the radio path. The path attenuation is 
therefore assumed to be 


RG, Dz if | a haycedV dB. (6) 


This expression reduces to (5) for uniform rain density. 

There are neither sinks nor sources of rain in the radio path so, for 
constant drop diameter D, the hydrodynamic equation of continuity 
which applies is® 


fe] 
V -(ppVp) + ap = 0, 


where the time and space variables have been omitted for convenience. 
Substitution into (6) gives 
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and transforming to a surface integral by the divergence theorem’ we get 
trl) — 1a, D) | [—povorddl, 7) 

where the integral is over the surface enclosing the volume V. 

Expression (7) relates point rain rate to path attenuation. The vector 
differential dé has a magnitude equal to the differential area of the 
surface S, is normal to it and directed outward. Consider a small area 
on the surface S. The rain rate at this point is ppvp ; the component of 
this rain rate which is normal to the surface and directed into the path 
volume is —ppVp-dé/| dé |. The integral over the surface S is the in- 
crease in water density on the path in unit time which causes an increase 
in attenuation. 

Let the surface S enclosing the radio path volume V be described by 
orthogonal parametric curves on the surface.’° A point P(z, y, 2) on the 
surface can be written P(u, v) where, 








(AL)? . 
c= sin U COS V 
2 
=> L Cos U 
y= 9 
(AL). 
.= 2 sin usin v. 


With this transformation, and substituting Rp from (2), (7) becomes 


tn — 10, D) F ff [-Row, v5 9)-ds, (7a) 


where ds is the transformed vector surface differential. Integrating both 
sides over time 7 and dividing by T results in the following equation. 


mlcetaalt aa § f(b [” rant} 
(8) 


where the unit vector r is defined by Rp(u, v; t) = Ry(u, v; t)r. 
The left side of (8) is an approximation to the rate of change of 
attenuation since, by definition, 
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dap(t) _ ,. 

ay 
Whether attenuation or rain rate is measured, the measuring instrument 
has some time constant, or integration time, 7’. Since, in a microwave 
system, an outage of a millisecond may be significant, an important 
question arises—can we be sure that the measurement will give us all 
the important data? In other words does an integration time 7, exist 
such that for 7 < T, , (8) is a good approximation to (7a)? The ex- 
istence of 7, is justified by the following physical argument. 

For t « t, let there be no rain on the path. At ¢ = ¢, let the rain rate 
at every point on the surface S be &, = pov) and be directed inward in 
the direction of the shortest line from the surface to the path axis. In 
order that a substantial fade occur, the rain will have to travel a sub- 
stantial fraction of the shortest distance from the path surface to the 
path axis. The average of this distance is (\L)?/3 and the average time 
required for the rain to reach the axis is (\L)*/3v, . An integration time 
T, < (L)?/3v, is therefore sufficient since substantial fades will not 
occur in times less than this. If 7’ is chosen small enough, say T = T, , 
(8) is a good approximation to (7a), and can be written 


The quantity in brackets is the rain rate at the point (u, v) averaged 
over time 7’, and defines a suitable rain rate measurement. A rain gauge 
which measures rain rate in accordance with this definition is described 
in the Appendix. If the rain rate is known at every point on the surface 
of the radio path the time derivative of attenuation can be computed 
from (9). 

Since rain rate cannot be measured at all points on the surface of the 
path we consider what can be done with a more reasonable experiment— 
one or more rain gauges near the ground in the vicinity of the path. 
No satisfactory theoretical expression has been derived for computing 
attenuation from rain rate measurements made near the ground in the 
vicinity of the path. And the wide disagreement between computed 
and measured attenuations in experiments of this type described in the 
literature support the conclusion that the empirical expressions used 
are also unsatisfactory.” Also, the requirements of the sampling theorem 
must be met if the attenuation is to be computed from sampled rain 
rates.'’ Visual observation of rainfall reveals a spatial structure so fine 
that an unreasonably close spacing of rain gauges would be required to 
meet these requirements. 


len(t + T) = apd), 
T 
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III. POINT RAIN RATE AND PATH ATTENUATION DISTRIBUTIONS 


3.1 Important Assumptions 


Fortunately, the attenuation as a function of time is not required 
for the design of radio links; for radio link design the fraction of time 
that the path attenuation exceeds the fading margin is the important 
parameter. Thus, we need only a suitable statistic of path attenuation 
which can be related to a similar statistic of rain rate at ground level 
in the vicinity of the path. In this section a point rain rate distribution 
function is defined and related to the path attenuation distribution 
function. 

Let the rain rate be R(u, v; t) on the surface S of the path. At some 
point near the path—on the ground beneath the path, for example—a 
rain gauge measures a sequence of rain rates given by 


tot (nt1)T 

RAT, K, 2,42) =] [-R@v29-Kat 0) 
where k is a unit vector normal to the collecting surface of the rain 
gauge and 2, y, 2 are the space coordinates of the rain gauge. Suppose 
that rain rate measurements have been made for a very long time. The 
data available is a large number of rain rates F(T, k, 2, y, 2), one for 
each interval 7’. The data is organized by choosing a rain rate R, and 
computing the fraction of intervals T for which the rain gauge recorded 
rates less than FR, . This fraction is denoted by 


P(L,(T,-K; 2, y, 2) S Ral, (11) 


and is a point rain rate distribution function; it is a function of R, , the 
integration time 7’, the location of the rain gauge, and the pointing di- 
rection k. The optimum direction for k is expected to be vertically up- 
ward in many regions; in any case, the rain gauge must be pointed in the 
direction k such that for the high rain rates of interest, that is, for 
fie > digs 


PIF,(T, k, a; Y; z) Ss Rol Ss PIR,(T, l, wv, Y; z) Ss R.], 


where | is any unit vector. 

For the radio systems of interest, the rain rate, R,,, may be chosen 
at least one order of magnitude greater than the mean rain rate. In 
this country the mean rain rate is on the order of 0.1 mm/hr whereas a 
reasonable value of R,, may be 10 mm/hr. 

Let the radio path be divided into a large number, n, of volume 
elements such that the rain rate is uniform in each element. The average 
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rain rate on the path is 


Rave(t) S : DRG; ] Yi » 8: } t). (12) 
t= 
In integral form this is written 
1 od 
R (= | (nr u.2z: bad 
~vaverv*/ V i Ra, Y; 23 t) dV. (12a) 


Two assumptions are made: 
(<) In a region containing the radio path the point rain rate dis- 
tribution function is independent of position and (11) can be written 


PIR,(T, k, 2, y, 2) S Ro] = PIR,(T) S R,]. (13) 


(iz) For the rain rates of interest, that is, for R, > R,,, and for 
integration time 7’, the distribution function of the average rain rate 
on the path is greater than, or equal to, the point rain rate distribution 
function. 


P[Ra(t, T) = KR.) 2 PIR,(T) S £.], (14) 


where, 
it t+T7 
Realty T) = 5 / Rese(t) dt. 
t 


The first assumption is that the point rain rate distribution function 
is the same whether it is measured below the path, on the path, or 
near the path. It does not mean that the rain rate at any time ¢ is the 
same everywhere in the region—an essential distinction. The assumption 
does not mean that rain rate is a stationary random process in either 
the wide sense or the strict sense. In statistical language it means that 
the rain rate can be considered a first order stationary process over a 
small interval.” The second assumption reflects Bussey’s observation 
that high rain rates extend over smaller areas than do lower rain rates. 


3.2 A Bound on the Path Attenuation Distribution 


If the speed of the raindrops does not change while in the radio path, 
the attenuation can be written in terms of rain rate. From (6), 


ap(t) = HO) iff Rol(z, y, 2; }) dV. (15) 


Let the rain have the Laws and Parsons distribution of drop diam- 
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eters. Then 
Ro, y, 2; t) = R(x, y, 2; po, (16) 
where pp is the fraction of water in the rain consisting of drops of 


diameter D. Substituting into (15) and using the definition (12a) the 
expression for attenuation is 


kQ, D 
apt) = = O AOS) Raul. (17) 
’ The total attenuation is 
k(, D 
a(t) = LR,,.(d) >> ED). (18) 
D Up 


The quantity represented by the summation has been computed by 
Ryde and Ryde and by Medhurst for the Laws and Parsons drop 
diameter distribution and for the terminal velocities of water drops in 
still air.° 

In Section 2.3 we showed that negligible changes occur in a(é) in a 
time T S T,. Thus, (18) can be written 


ald) alt, 7.) = LRawlt, 7.) 5, | (19) 


where 
1 t+T7 
a(t, T) = 5 | a(t) dt. 
The path attenuation for a uniform rain rate R, is, from (18), 
a, = LR, > Oe ; (20) 
D 


UD 
The desired bound can be found by substituting from (19) and (20) 
into (14). 
Pla@) S a,)] = P[R,(T.) S &,]. (21) 
This bound says that if the measured point rain rate distribution is 
converted to an attenuation distribution by (20), the outage time 


predicted is greater than, or equal to, the outage time that occurs on 
the path. The application of this bound is illustrated in Section V. 


IV. EXPERIMENTAL DETERMINATION OF INTEGRATION TIME 


It was shown in Section 2.3 that if the rain rate integration time 7 
is short enough no significant fades will be missed; specifically if 7 = 
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T, < (AL)?/3v, , it is small enough. This determination of T, is based 
on considerations as to what could happen on a path. Fades may occur 
slowly compared with J, so it is worthwhile to determine whether a 
larger integration time may be practical. 

Suppose that path attenuation and point rain rates have been meas- 
ured with an integration time 7 < T,. Then the distributions com- 
puted from the measurements will satisfy the inequality (21) which 
can be written 


Plae@, T) Sa) 2 PIRAT) SR), TST.. (22) 


Now, from (14), this inequality holds for any JT and since 7’ S T, all 
fading of significance is included. 

Let the attenuation distribution be computed from the path loss 
measurements for integration times of 7, 27, 37, --- , as long as the 
distribution remains substantially unchanged. The point rain rate 
distributions computed for the same integration times are such that 
the inequality holds and 


Plat, mT) = a] 2 P[R,(mT) = £,], (23) 


for all m = 1, 2, 3, --- . Now suppose that Pla(t, mT) S a,] remains 
substantially unchanged for all integers m up to M. Then all of the 
corresponding rain rate distributions result in valid upper bounds on 
outage time as shown by (23); one of these will be the least upper 
bound. 

From experiments of this kind, practical values of integration time 
can be determined. It is expected (but not proven) that the integration 
time which results in the least upper bound will be the largest value 
for which the attenuation distribution remains unchanged, that is, MT. 


Vv. DISCUSSION 


Experimental rain rate distributions which meet the requirements 
of this theory are not available. For this reason a careful experimental 
verification cannot be made now. There is reason for optimism, however. 
The upper bound on outage time computed from Bussey’s’ one-minute 
point rain rate distribution is remarkably close to the one-minute 
attenuation distributions reported by Semplak and Turrin.’*""* These 
distributions are shown in Fig. 2. Semplak and Turrin also report that 
the attenuation distribution remains unchanged for shorter integration 
times.** 

The simplest application of this theory to the design of a radio path 
requires only a point rain rate distribution measured as described in 
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Fig. .2—Comparison of computed and measured attenuation distributions for 
one minute integration time. 


Section III. In this case the rain rate integration time is computed 
from the frequency, the path length, and the maximum raindrop veloc- 
ity. This measured point rain rate distribution can be converted to 
an upper bound on attenuation outage time by use of expressions 
(20) and (21). The upper bound obtained this way is not necessarily 
the least upper bound whereas the least upper bound is required for 
optimum radio system design. The least upper bound can be computed 
from the point rain rate distribution if the corresponding optimum rain 
rate integration time is known; the optimum rain rate integration time 
can be determined from a path attenuation experiment as described 
in Section IV. 

It is important, then, to determine the optimum integration times 
in those regions of the country where radio systems above 10 GHz may 
be used. The optimum integration time is a function of wavelength, 
path length, and the climate in which the path is located. A few path 
loss measurements, located in different climatic regions, would yield 
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valuable results. In these experiments attenuation distributions would 
be measured as functions of path length and wavelength, and optimum 
integration times computed from this data and the measured point rain 
rate distributions. It may be, for instance, that the optimum integration 
times are about the same everywhere; if so, the path loss experiments 
would show it and, thereafter, only point rain rate measurements 


: 
would be required. 


The accuracy with which the least upper bound predicts the outage 
time on a radio path cannot be stated precisely until further experi- 
mental data is available. It may be anticipated, however, that the 
accuracy will decrease as the path length increases. For example, if 
the path is long compared with the dimensions of thunderstorms, the 
least upper bound prediction will probably be pessimistic, especially 
for large fades. On the other hand, for paths shorter than the dimensions 
of thunderstorms the least upper bound prediction may be accurate. 

A rain gauge, from which the rain rate in each fixed integration 
interval can be determined, is suitable for the determination of the 
point rain rate distribution function defined in Section ITI. Morgan 
has built and described such a rain gauge recently, and another rain 
gauge proposed for this purpose is described in the Appendix.”° 

No attempt has been made to include the effects, on attenuation, of 
raindrop distortion, temperature, radio wave polarization, and so on, 
in this theory. 
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APPENDIX 


An Instrument for Measuring Point Rain Rates 


The instrument described measures rain rates in accordance with 
the definition of (9). Figure 3 illustrates the basic element in the rain 
rate instrument—a depth gauge consisting of a funnel, a cylindrical 
capacitor, and a shutter for draining the capacitor. The funnel is exposed 
to the rain for a specified time 7’ and then covered. The rain which 
passed through the collecting area of the funnel drains into the cylin- 
drical capacitor and forms a column of water as shown. The dielectric 
constant of water increases the capacitance—the greater the height of 
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Fig. 3 — Capacitor depth gauge. 


the water column, the greater the capacitance. When the funnel has 
completely drained, the capacitance (and hence the volume of water) 
is measured. 

There are two features of importance about this depth gauge. 


(¢) The interval of exposure, or integration time, can be determined 
precisely by careful operation of the shield over the funnel, and is not 
dependent on the rain rate. 

(iz) The time allowed to drain the funnel and measure the capacitance 
is independent of the exposure interval 7’. Sufficient time can be allowed 
to drain the funnel and stabilize the water column prior to measuring 
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the capacitor. This eliminates fluctuations due to the random behavior 
of water flow on the surface of the funnel. 

When the shield is over the depth gauge no rain is collected. The 
complete rain gauge consists of a number of depth gauges arranged as 
shown in Jig. 4. In this illustration, ten depth gauges are shown, with 
number 2 in position to collect rain. When the interval ends, the shield 
rotates, covering depth gauge number 2 and exposing number 3. 

A rotating shutter for draining the depth gauge capacitors is shown 
in Fig. 4B. The shutter is fixed to a common shaft with the funnel 
shield of Fig. 4A. The operating sequence is as follows. There are ten 
time intervals of length T in a single rotation of the shaft. With the 
aid of Fig. 3 the following sequence can be seen to occur. 


Time Interval Status of Depth Gauge 
Number #2 

I Collecting Water Draining Capacitor 
2 Draining Funnel Collecting Water 
3 i Draining Funnel 
4 | i 
5 | | 
6 | 
7 | 
8 Measure Capacitance 
9 Draining Capacitor Measure Capacitance 

10 Draining Capacitor Draining Capacitor 


The rain shield steps rapidly from gauge to gauge. There is always 
one gauge collecting rain; one measurement is made in each time in- 
terval T. If the measurement starts at time é, and the rain gauge points 


-ROTATING SHUTTER 
FOR DRAINING DEPTH 
10 ' | GUAGE CAPACITORS. 
_-- DEPTH GAUGE \ =NO.3 AND 4 DRAINING 
NO.2 EXPOSED 
TO RAIN 











CAPACITOR 
CONNECTIONS 
TO 


MEASURING 
CIRCUIT 
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(a) (b) 


Fig. 4(a) Top view of rainrate gauge. (b) Section of rainrate gauge. 
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vertically upward, the output of the instrument is a sequence of meas- 
urements 


tot(nt1)T 


Rut, @, 9,2) = «Rat. 
totn 
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An Improved Thermal Gas Lens for 
Optical Beam Waveguides 


By P. KAISER 
(Manuscript received September 3, 1969) 


The quality of thermal gas lenses can be significantly improved if the 
heated gas is exhausted radially. This type of exhaust is typically created 
in a counter-flow lens where opposing gas flows pass through two closely 
spaced lens sections and exhaust radially through a small gap between 
these sections. We found that the stability of the focusing behavior depends 
on a laminar flow transition region of sufficient length, a properly chosen 
width and shape of the exhaust gap, and on mechanical symmetry and 
smoothness. 

We describe the guiding properties of a single lens as well as a beam 
waveguide consisting of up to eleven counter-flow lenses. Off-axis injections 
of the light beam with displacements amounting up to 45 percent of the 
lens radius resulted only in changes of the beam width (less than 23 percent 
for 11 lenses), whereas the gaussian profile was essentially maintained. 


I. INTRODUCTION 


Gravitational and spherical aberrations in thermal gas lenses are 
known to cause distortions of light beams with gaussian intensity 
distribution.*~° The eventual use of this lens in long distance optical 
communication links depends in part on our ability to improve its 
quality and to develop effective beam control devices which periodically 
free the distorted beam from higher order modes.° 

By analyzing the severe aberrations of the thermal gas lens reported 
in an earlier paper in this journal,* we found that the axial exhaust 
of the heated gas was primarily responsible for the low quality of that 
lens. We demonstrate that a significant improvement of the focusing 
properties of the thermal gas lens can be achieved if the heated gas is 
exhausted in the radial direction. 


II. CRITICAL APPRAISAL OF AN EARLIER THERMAL GAS LENS 


The thermal gas lens used in a recent experiment* was highly aber- 
rated and caused severe beam deformations already for small off-axis 
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injections of a light beam into the lens guide. Attempts to decrease 
the predominantly gravitational aberrations by increasing the flow 
velocity resulted in instabilities of the focusing action. This was the 
apparent consequence of residual turbulent motion of a gas flow with 
nonuniform temperature distribution. We found that the occurrence 
of flow instabilities was attributable to its construction (see Fig. 1). 
Exponential inlet and exhaust regions connected the lens elements 
with a larger diameter spacing tube which served as heat sink for 
cooling purposes. At higher flow rates, the laminar flow transition tubes 
preceding the heated lens elements were too short (stable focusing 
action requires the gas flow to be fully developed and free from residual 
random motion when it enters the heated lens section). For that purpose 
the gas has to pass through a tubular transition region with the same 
inner diameter as the lens and whose minimum length L, depends on 
this diameter and the Reynolds number R,,’ 


L, = 0.04 dR, 
R. = Wav d 
v 


d = lens diameter (=0.635 cm) 
W)aw = average gas velocity 
v = kinematic viscosity (0.15 em?/s for air). 


For an air flow rate of 3 liters per minute and for a resulting R, of 
670, the minimum length is at least 27 lens diameters. However, this 
value is only approximate since the stratification of the flow depends on 
the particular type of injection used and the initial amount of eddies 
present. Furthermore, it depends on the degree of beam stability re- 
quired. 

Another reason that focusing fluctuations occurred was the presence 
of flow separation in the fast expanding exhaust region, which again 
resulted in a fluctuating temperature distribution. Because of its at- 
tractively simple design, we tried to retain the axial exhaust of the 
continuous flow system, and we attempted to improve its shortcomings. 
Through the selection of a longer transition tube, laminar flow can 
be obtained in the entrance region. However, in order to avoid flow 
separation, expansion of the diffuser behind the lens has to occur very 
gradually. Eventually the gas is not cooled back to its original tempera- 
ture by mere heat exchange with an insulated diffuser wall and the 
diffuser would have to serve as a heat sink as well. During the cooling 
process the temperature profile becomes distorted. Thus, any cooling 
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of the heated gas in the path of the light beam appears to be associated 
with large aberrations. With our present knowledge, therefore, their 
practical usefulness seems to be rather limited. 

We can avoid the lens deteriorations which are associated with an 
axial exhaust by exhausting the heated gas radially. This type of exhaust 
is suitably created through the confluence of two opposing flows in the 
counter-flow lens (CFL). This lens consists of two tubular lens sections 
which are separated by a small gap (Fig. 2). Laminar flows of cold gas 
enter the counter-flow lens from both sides and exhaust through the gap 
in the center via a free stagnation flow. The flow pattern and optical 
properties of the counter-flow lens are the subject of the following study. 

The possibility of reducing the spherical aberrations of the thermal 

‘gas lens by using two such lenses with opposing flows back to back was 
originally suggested by Marcuse.’ His theoretical analysis, however, 
neglected gravity forces as well as effects in the exhaust region. An 
experimental comparison between a counter-flow lens and a lens of 
identical length with unidirectional flow, whose radial exhaust at the 
end of the lens was created with a glass plate, revealed no discernible 
difference between their focusing qualities. The theoretically proven 
difference in their spherical aberrations is obviously small and becomes 
apparent only after the passage of many lenses. Nevertheless, it adds 
to the attractive features of the counter-flow lens. Berreman had already 
used the counter-flow principle in his ‘‘chimney lens’, in which the 
flow velocity was determined by free convection.°* 
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Fig. 2— Thermal counter-flow gas lens with radial exhaust. (All dimensions 
are in centimeters.) 
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III. FLOW CHARACTERISTICS OF THE COUNTER-FLOW LENS 


The counter-flow lens used in our experiments consisted of two 
identical lens halves which were mounted independently (see Figs. 2 
and 7). Air was injected into the lens from both sides via 5.08 em diam- 
eter porous tubes (2.54 ecm long). These were followed by 0.635 cm 
diameter flow transition tubes and copper tubes of identical diameter. 
The assembly was surrounded by a 5.08 cm diameter copper tube for 
mechanical rigidity. The use of porous tubes allowed us to reduce the 
length of the transition tubes and still achieve laminar flow. Their 
omission caused unstable focusing and forced us to increase the length 
of the transition tubes beyond that indicated in Fig. 2. We concluded 
at this point that we can shorten the transition length by laminar 
injection of the gas through porous tubes having identical diameters 
as the transition tube and lens. This assumption proved to be true in 
later experiments. 

In a typical mode of operation, the counter-flow lens is closed on both 
sides with glass windows in order to prevent leakage of the gas. In a 
simplified mode of operation, we can omit the windows if we suck air 
from the surrounding atmosphere into the lens by means of a vacuum 
pump connected to the exhaust gap. Aside from its simplicity, the 
latter method allows us to determine the focusing characteristics of 
the counter-flow lens more accurately and without the interference of 
end windows. Note that the length of the transition tubes in this case 
had to be increased to more than twice the length indicated in Fig. 2 
in order to avoid beam fluctuations. 

A properly chosen width and shape of the exhaust gap proved to be 
important for avoiding flow instabilities. When the gap was either too 
small or too large, we observed fluctuations of the focused beam, par- 
ticularly for slightly misaligned lens halves and asymmetric flows. 
We observed most stable focusing when the width of the gap assumed 
approximately the same value as the lens radius. In this case, the radial 
exhaust area amounted to twice the cross-sectional area of one lens. 
As might be imagined, the flaring of the inner diameter near the exhaust 
gap was also noticed to have some influence on the beam stability. 
Tubes having inner diameters rounded off and with a radius whose 
value corresponded to that of the wall thickness (1.6 mm) proved to 
be inferior to a sharp-edged exhaust. But we believe that some round- 
ing of the edges helps to avoid flow instabilities in that region. Simi- 
larly, lack of smoothness of the end faces of both lens halves in the 
exhaust gap was also found to be conducive to instabilities. 
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As a result of this study, we recognize that flow instabilities pose a 
major problem in the operation of the counter-flow lens. This is par- 
ticularly true when we remember a fact well known in the field of fluid 
amplification: depending on the boundary conditions, the flow in a 
rapidly expanding channel can assume several flow patterns. Thus, 
slight perturbations of the flow or the boundary conditions can cause 
it. to change, for example, from an axially symmetric pattern to an 
asymmetric pattern, or to oscillate between two asymmetric patterns. 
Therefore, it seems to be advisable to introduce deliberately a certain 
asymmetry into the exhaust gap. Its specific character must be left to 
future investigation, but it is desirable that it does not adversely affect 
the lens quality. 

In studying the instabilities, we limited ourselves to low frequent 
beam fluctuations detectable by visual observation and by an ordinary 
x-y recorder. 


IV. OPTICAL CHARACTERISTICS OF THE COUNTER-FLOW LENS 


The counter-flow lens is put into operation by establishing the proper 
gas flow rate and heating the lens elements to a temperature which 
yields the desired focal length. The temperature difference between wall 
and inlet air (at room temperature) was limited to approximately 
100°C because the foam insulation (Nopcofoam H402N) could not 
tolerate higher temperatures. 

Coaxial alignment of the counter-flow lens with a slightly diverging 
gaussian light beam was accomplished with the aid of a photoelectric 
probe containing a pinhole. The radius of the unfocused beam at the 
center of the lens was 0.66 mm. Supported by theoretical analysis, 
we assumed the center of the counter-flow lens to be the location of 
the principal plane." We measured the profile of the focused beam at an 
arbitrary distance of approximately 4 meters from the lens with the 
aid of an automatic recording system, consisting of a motor-driven 
photosensor with a pinhole, movable in two perpendicular directions, 
an amplifier, and an x-y recorder. 

The focal length was determined by a substitution method: The 
half power width of the focused beam was compared with that of a 
set of glass lenses with known focal lengths. These were placed into 
the beam instead of the counter-flow lens, yielding a calibration curve 
beamwidth versus focal length. The focal length f as function of the 
temperature difference AT’ between wall and inlet air for different 
flow rates F' (with F'/2 flowing in either lens half) is shown in Fig. 3. 
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_ Fig. 3— Focal length and power consumption of the counter-flow lens as func- 
tion of the temperature rise for different air flow rates, F. (QO — 5.25 L/min; 
CO — 5.75 L/min; A — 6.25 L/min.) 


We selected the respective flow rates because we found them to be 
associated with smallest aberrations as we will see below. We notice 
that at above values the focal length is essentially independent of the 
flow rate and depends mainly on the temperature rise. Theoretical 
analysis’ has shown that also without the consideration of free convec- 
tion effects, minimized focal length distortions coincide with flow rates 
for which the focal length assumes minimum values. — 

The power consumption P of the counter-flow lens increases as 
function of the flow rate as shown in Fig. 3. For a focal length of 0.40 m, 
the required input power was 7.08 W (F = 6.0L/min, AT = 61.8°C). 
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The same temperature difference without gas flow requires an input 
power of 2.48 W. The difference of 4.60 W is thus necessary to heat 
the gas. The theoretical value for the heat transferred to the gas at 
above AT is given by’? 


P,, = ckL AT = E — 0.820 exp (—7.316 Y)| 





7 
with 
k = thermal conductivity of gas 
L = length of one heated lens element 
Vv, = maximum gas velocity 
y= 
ap C, 


lens radius 
gas density 
Cp = specific heat at constant pressure 


lI 


and is calculated to be 4.65W (k = 6.28-10~° calories/cm second degree, 
L = 15em, AT = 61.8°C, v, = 318 cm/s, a = 0.3175 em, p = 1.21-107° 
gram/em’*, c, = 0.240 calories/gram degree). 


Making use of the fact that the temperature profile of the gas flow is 
still maintained for some distance behind the heated tube, we could 
reduce the power consumption of the thermal gas lens by substituting 
the latter portion of the heated tubes with insulating tubes. 

As mentioned before, for the given geometry, we observed minimized 
lens aberrations at flow rates near 6 liters per minute. We determined 
these optimum flow rates in the following way: a gaussian beam was in- 
jected into the lens with increasing parallel displacements. Since the focal 
length in the aberrated gas lens changes with the radius, we used the 
resulting changes of the beam width as a measure of distortion. Normal- 
ized changes of the width as function of off-axis injections for different 
flow rates are presented in Fig. 4. At flow rates below approximately 
6 liters per minute we found pronounced asymmetry for vertically dis- 
placed beams due to gravity. At flow rates beyond 6 liter per minute 
this asymmetry gradually disappeared, but similar symmetric changes 
of both the vertical and horizontal beam width for growing off-sets 
indicated the increasing presence of spherical aberrations. Near 6 liters 
per minute we recorded smallest changes of the width, but still noted a 
vertical asymmetry. 
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Fig. 4— Relative change of beam width as function of off-axis rie! for 
beams injected above (A) and below (V) the optical axis (f = 040 m, AT = 
62°C, wo = on-axis beam width, a = lens radius). (a) F = 5.5 L/min; (b) F = 
6.0 L/min; (c) F = 65 L/min. 


The method of off-axis injection permits us to probe exactly the 
different portions of the counter-flow lens with their aberrations. A 
somewhat faster, but less detailed. method of determining the optimum 
flow rate is to focus a comparatively large beam which fills a major 
portion of the lens’s cross section. The distorting influence of the lens 
aberrations is thus magnified. The beam radius chosen for that purpose 
was 1.10 mm. The temperature at selected flow rates was adjusted so 
as to result in a focal length of 40 cm, measured as a constant half power 
width of the horizontal profile. As a measure of distortion, we used the 
change of the intensity profile at the 1/10 power point, whereby the 
profile obtained with a high quality glass lens served as reference (Fig. 
5). In ease of the asymmetrically focused vertical profile, we considered 
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Fig. 5— Distortion of an oversized beam as function of the flow rate (f = 
0.40 m). —— reference beam; — — — — deformed beam; normalized at J = 1 
and I = 0.5 with reference beam. 


the changes of the upper and lower half width—counted from the posi- 
tion of the maximum intensity—as a measure of the gravitational aber- 
rations. The approximate symmetric increases for horizontal and vertical 
profiles at higher flow rates are due to spherical aberrations. As with the 
method before, the optimum flow rate is again seen to lie somewhere 
between 5.5 and 6 L/min, where horizontal beam width changes are 
smallest. Temperature differences and the power consumption at the 
different flow rates are shown in Fig. 6. 

The flow rates associated with minimized total aberrations still have 
noticeable gravity aberrations which have to be compensated for. 
For a coaxial alignment of the two lens halves and for the flow rates 
and temperatures of interest, the difference between the mechanical and 
optical axes was approximately 0.20 mm. This compares favorably with 
the 0.5 mm measured with an earlier gas lens with axial exhaust.* 

If we align the counter-flow lens on its optical axis, the focal length 
and power consumption are slightly different from those shown in 
Fig. 3. Note that the optical axis itself depends on the flow rate and 
the temperature difference. 


V. A. BEAM WAVEGUIDE COMPOSED OF COUNTER-FLOW LENSES 


For the purpose of examining further the flow pattern and the optical 
properties, we constructed a beam waveguide consisting of up to eleven 
counter-flow lenses. We supported the lenses in a U-channel in the same 
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fashion as described earlier.* The air was supplied by a 2-inch pipe 
which ran parallel to the guide and had taps with valves as needed 
(Fig. 7). 

The fundamental mode of the beam waveguide was generated in a 
laser cavity which consisted of a curved (R = 0.80 m) mirror and a 
plane mirror, and a 17-cm-long He-Ne discharge tube with a 2-mm bore. 
The ideal beam radius at the center of each lens amounted to 0.3882 mm. 

The first lens was aligned coaxially with the beam at a distance of 
one lens spacing from the curved mirror. Near the laser, the lens was 
closed with a flat, antireflection coated window. On the opposite side 
(behind feed 2; see Fig. 7) the lens was temporarily closed with a glass 
window for the purpose of establishing air flows in both lens halves. The 
lenses were then aligned on their optical axes. The temperatures were 
adjusted such as to produce the same width of the focused beam as that 
of the original beam after the target was moved farther away from the 
laser cavity the equivalent of one lens spacing. By doing this for all 
lenses, we guaranteed periodicity of the guided beam despite variations 
existing between their mechanical and optical properties. Thereafter 
the first half of the second lens was connected with feed 2. We used a 
flexible coupling between the lenses in order to avoid mechanical feed- 
back. After we removed the alignment-probe, we increased the gas flow 
into the second feed to 6 liters per minute. Since the temperature of the 
thermal gas lens is sensitive to changes of the flow rate, and since the 
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Fig. 6— Temperature differences and power consumption as function of the 
gas flow rate for a constant focal length of 0.40 m. 
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temperature of the first lens did not change appreciably after the flow 
into feed 2 was increased, we had an indication for the symmetric sub- 
division of the total flow into both directions. Subsequently, the second 
half of the second lens was aligned and the lens put into operation as 
before. All other lenses were aligned analogously. 

The vertical and horizontal power profiles of the on-axis beam after 
passing eleven lenses, together with profiles of beams which were in- 
jected into the guide with increasing parallel displacements are shown 
in Fig. 8. For beam displacements amounting up to approximately 
45 percent of the lens radius (relative off-axis injection: 0.45), the 
gaussian profile was maintained and only changes of the beam width 
took place. One notes that the changes of the beam width for vertical 
and horizontal off-sets are comparable. 

Due to the different phase relationships existing between the funda- 
mental and higher order modes at successive lenses, we obtain a more 
accurate picture of the higher order mode content by comparing the 
beam profiles after the ninth, tenth, and eleventh lenses of the guide 
(Fig. 9). After nine lenses, the beam width first decreased for growing 
offsets, went through a minimum around 0.3, and approached again its 
original width near 0.45. Then it rapidly increased, and the profile 
deteriorated for larger offsets. As could be expected from the phase 
progression of the second order mode, which is mainly responsible for 
beam width changes, a similar dependence was observed after eleven 
lenses. Here, the maximum changes of the beam width were in the order 
of 10 percent for relative offsets smaller than approximately 0.45. 
Largest changes of the beam width for up to 11 lenses occurred for 10 
lenses and were in the order of 23 percent for relative off-axis injections 
up to 0.45. 

In Fig. 10 we show relative changes of the beam width for a sequence 
of 10 gas lenses, and also for a single lens with axial exhaust for com- 
parison (see Fig. 1 and Ref. 4). 


VI. CONCLUSIONS 


The focusing properties of a thermal gas lens with radial exhaust 
were shown to be distinctly superior to those of a lens with axial ex- 
haust. Laminar flow transition tubes of sufficient length and a properly 
chosen width and shape of the exhaust gap resulted in a flow pattern 
with stable focusing action. It also permitted us to increase the gas 
flow rate to a value at which the combined gravitational and spherical 
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Fig. 8 — Vertical (——) and horizontal ( — — — —) power profiles of beams 
injected off-axis into a beam waveguide composed of 11 counter-flow lenses (f = 
0.40 m, distance/focal length = 1.8). 


aberrations were minimized. For a total length of the heated lens sec- 
tion of 30 cm (Fig. 2), the optimum flow rate was near 6 liters per 
minute (diameter of the lens: 0.635 cm). At this flow rate an input 
power of 7 watts was required to maintain a temperature difference of 
62°C between the inlet air and the wall, resulting in a focal length of 
0.40 m. An input power of 2.5 watts was required to establish the same 
wall temperature without gas flow. 
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Fig. 9— Relative change of beam width as function of off-axis injection into a 
guide composed of (a) 9, (b) 10, and (c) 11 counter-flow lenses (f = 0.40 m, 
D/f = 18, wo = on-axis width, a= lens radius). Beam injected above (A), below 
(V), and right and left (() of ‘optical axis. 


The guiding properties of a beam waveguide consisting of up to 
eleven counter-flow lenses were found (focal length 40 cm, lens spacing 
to focal length ratio 1.8): off-axis injections of the light beam with 
displacements amounting up to 45 percent of the lens radius resulted 
only in changes of the beam width (less than 23 percent for up to 11 
lenses), whereas the gaussian profile was essentially maintained. In 
comparison, a beam passing through thermal gas lenses with axial ex- 
haust was completely distorted when injected with similar displacements. 

Thus, the counter-flow lens was proven to be a gas lens of superior 
quality which warrants its use in a lens guide encompassing a larger num- 
ber of these lenses. In view of this, several improvements and further 
studies are presently under way which we will report on at a later date. 
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Fig. 10 —Relative increase of beam width for a beam injected off-axis into gas 
lenses with axial exhaust (f = 0.50 m, F = 1.5L/min, AT = 110°C). Beam injected 
above (A), below (\/), and right and ‘eft (CL) of optical axis. 
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